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Preface 


Richard  W.  Barker 

The  University  of  Texas  at  Austin,  1991 

Supervising  Professors;  Melvin  J.  Hinich  and  Georgia- Ann  Klutke 

This  study  balances  the  development  of  theory  and  its  application  to  real 
and  simulated  incipient  fault  data  from  systems  which  have  cyclostationary  prop¬ 
erties.  The  study's  theoretical  contribution  reveals  the  advantages  of  approaching 
estimation  of  time  series  in  a  general  framework  where  estimation  of  the  cumulant 
spectrum  can  reveal  implications  for  three  classes  of  stochastic  processes:  station¬ 
ary,  cyclostationary,  and  nonslalionary.  The  developed  cumulant  spectrum  esti¬ 
mation  capability  provides  estimates  for  feature  construction  in  addition  to 
bispectrum  and  power  spectrum  estimates  of  stochastic  process  data.  Actual  ex¬ 
perimental  data  is  obtained  to  study  the  incipient  wear  process  of  manufacturing 
drill  bits  cutting  through  epoxy-glass  composite  material  used  for  construction  of 
electronic  semiconductor  panels.  The  fluctuating  vibrations  caused  by  the  drill  hits 
cutting  through  the  epoxy-glass  composite  arc  not  subject  to  precise  prediction,  nor 
are  the  external  noise,  measurement  errors,  and  other  disturbances  in  the  trans¬ 
mission  of  the  vibration  signal  to  three  accelerometers  mounted  on  the  drilling 
machine  considered  to  have  the  same  characteristic  of  unpredictability.  Tven 
though  there  is  some  element  of  determinism  in  the  generated  signal  data  due  to  the 
common  periodic  excitation  of  the  rotating  drill  spindle,  the  vibration  signals  and 


noises  do  vary  with  time.  The  randomness  which  exists  from  sample  function  to 
sample  function  throughout  a  complete  ensemble  (inherent  sampling  variability)  is 
a  characteristic  of  any  stochastic  process.  But  there  is  also  a  randomness  from  time 
instant  to  time  instant  from  an  object  sample  function  to  the  .same  sample  function 
as  the  object  wears  over  time.  This  is  the  other  element  of  randomness  that  is  of 
primary  focus  in  this  research.  The  application  portion  of  the  study  consists  of 
pattern  recognition  analyses  of  simulated  and  actual  experimental  data  to  determine 
the  incipient  fault  discrimination  and  classification  ability  of  classifiers  using  fea¬ 
tures  with  and  without  higher-order  statistical  (HOS)  information.  Exploitation  of 
probabilistic  and  statistical  concepts  has  led  to  a  new  incipient  fault  detection  ap¬ 
proach  for  rotating  physical  sy.stems. 
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A  new  analytical  approach  is  developed  for  detecting  incipient  faults  of 
rotating  machinery  whose  periodical  characteristics  generate  time  series  data  repre¬ 
sentable  as  cyclostationary  processes.  The  new  approach  is  a  higher-order  statis¬ 
tical  (IlOS)  method  as  nonstationary  time  scries  estimation,  in  addition  to 
stationary  and  nonlinear  estimation,  provide  the  basis  for  enhanced  feature  infor¬ 
mation  of  the  random  fault  mechanisms  under  study.  An  algorithm  selects  and 
combines  different  transformed  estimates  of  the  raw  time  series,  second-order 
cumulant  spectrum  (nonstationary),  power  spectrum  (stationary),  and  bispectrum 
(nonlinear),  for  investigation  of  incipient  fault  discrimination  and  classification 
power  of  multivariate  classifiers  using  different  extracted  feature  information  sets. 
The  HOS  approach  (cumulant  spectrum,  bispectrum,  and  power  spectrum),  is 
tested  and  evaluated  against  a  traditional  power  spectrum  approach  with  simulated 
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and  actual  experimental  data.  Robustness  of  the  HOS  approach  is  first  investigated 
in  simulated  time  series  signals  with  amplitude  and  phase  modulation  indices  and 
differing  levels  of  additive  Gaussian  noise  as  parameters.  Simulations  .show  that 
use  of  flOS  features  improves  incipient  fault  detection  capability  of  a  linear 
classifier  and  is  less  sensitive  to  Gaussian  noise  within  the  signal  environment. 
Actual  vibration  signals  from  a  rotating  drill  wear  monitoring  study  are  also  ana¬ 
lyzed.  The  drills  are  used  in  the  manufacturing  of  electronic  circuit  cards  from 
epoxy-glass  composite.  Combining  HOS  features  with  power  spectrum  features 
improved  the  overall  classification  performance  of  parametric  and  non-parametric 
classifiers.  Additionally,  the  HOS  approach  is  less  sensitive  to  changes  in  drilling 
process  parameters  such  as  circuit  card  construction  and  chip  load.  The  pattern 
recognition  analyses  performed  in  this  re,search  provide  strong  statistical  evidence 
that  HOS  estimation  and  feature  extraction  is  beneficial  for  discrimination  and 
classification  of  incipient  failures  of  rotating  tools,  a  difficult  mechanical  system 
monitoring  problem. 
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Chapter  1 
Introduction 


I .  I  Introduction 

This  dissertation  is  concerned  with  the  problem  of  detecting  incipient  faults 
of  rotating  machinery.  Because  of  their  periodic  nature,  these  types  of  physical 
systems  are  mathematically  represented  as  cyclostationary  processes.  Rotating  ma¬ 
chine  research  studies  have  proposed  various  monitoring  methods  (Micheletti,  1976, 
and  Fetly,  1984)  but  for  reasons  such  as  instrumentation  difficulties  in  obtaining 
measurements  at  or  near  the  cutting  surface  of  rotating  tools,  implementation  of 
most  monitoring  methods  in  industry  is  limited.  Some  of  the  better  monitoring 
techniques  appear  within  the  vibration  analysis  literature  (Braun,  1986,  and  Shives 
and  Mertaugh,  1986)  where  vibration  monitoring  is  shown  to  .significantly  reduce 
the  cost  of  maintenance,  increase  reliability,  and  decrease  the  probability  of  cat¬ 
astrophic  failure  of  rotating  machinery.  Milner  (1988)  lists  hispectrum  analysis,  a 
particular  higher-order  statistical  (HOS)  method,  as  a  possible  approach  for  moni¬ 
toring  vibration  of  small  rotating  machines  in  a  NASA  spacecraft.  However,  he 
did  not  investigate  bispectrum  analysis  due  to  the  lack  of  adequate  computational 
methods,  nos  methods  are  defined  in  this  study  as  statistical  approaches  which 
analyze  stochastic  processes  and  their  generated  time  series  data  associated  with 
nonlinear  and  also  nonstationary  phenomena.  The  bispectrum  provides  a  first 
glimpse  at  nonlinear  effects  as  it  is  the  Tourier  transform  of  the  third-order  moment 
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function  of  a  stochastic  process  while  the  power  spectrum,  the  Fourier  transform 
of  the  second-order  moment,  is  most  useful  in  problems  estimating  linear  processes. 
Recent  findings  (Dan  and  Mathew,  1990)  conclude  that  no  single  condition  moni¬ 
toring  method  appears  suitable  for  all  machine  operations  and  material  combina¬ 
tions.  Consequently,  condition  monitoring  research  is  better  directed  towards 
improving  instrumentation  effectiveness,  collecting  better  data  on  the  functional 
relationship  between  wear  and  measured  parameters,  and  developing  sensor  fusion 
methods  which  combine  data  from  different  sensors  and  features  to  improve  system 
monitoring  accuracy. 

There  are  many  examples  of  models  using  a  combination  of  sensors  and 
signal  features  for  monitoring  rotating  tool  wear.  A  vector  autoregressive  moving 
average  model  developed  by  Yao  (1990)  used  three  axis  tool  force  measurements  to 
estimate  tool  wear  in  turning  of  steel.  Spindle  vibration,  cutting  torque,  and  force 
in  monitoring  of  milling  were  input  to  power  spectrum  analyses  to  extract  peak 
values  which  were  then  input  to  a  linear  classifier  (Elbestawi,  1989).  A  linear 
classifier  was  also  used  for  detecting  crankshaft  drill  wear  (Liu  and  Wiu,  1990)  using 
thrust  force  and  axial  acceleration  amplitude  signals.  Acoustic  emission  spectrum 
features  and  cutting  force  signals  input  to  a  neural  network  classifier  demonstrated 
the  applicability  of  neural  networks  for  noise  suppression  and  also  that  there  are 
an  optimal  number  of  features  (Rangwala  and  Dornfeld,  1990)  for  classification 
purposes.  Time  and  frequency  domain  characteristics  of  drilling  forces  for  carbon 
steel  (Braun  and  Lenz,  1986)  used  a  feature  based  on  probability  distribution  mo¬ 
ments  of  intensities  and  times  of  occurrences  of  a  single  oscillating  signal  pattern. 
Braun  and  Lenz  (1986)  also  stated  that  the  choice  of  appropriate  features,  whether 
single  or  combined,  need  to  be  based  on  test  results  or  experimental  databases. 
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I  hese  studies  are  a  few  examples  of  recent  sensor  fusion  techniques  in  milling, 
drilling  of  metals,  and  turning  operations.  Significantly  with  regard  to  this  research, 
feature  construction  of  .sensor  signals  in  these  recent  studies  is  limited  to  the  power 
spectrum  rather  than  any  higher-order  forms  of  spectra. 

However,  one  study  using  bispectrum  analysis  as  a  HOS  technique  to  di¬ 
agnose  abnormal  states  of  a  machine  from  the  normal  one  was  conducted  on  gear 
noise  signal  data  (Sato  et  al.,  1977).  Its  re.sults  showed  that  the  gear  noise  signals 
were  almost  periodical  under  proper  loading  and  normal  operating  conditions.  But 
when  heavy  load  conditions  scored  the  gear  .surfaces,  the  periodic  signal  character¬ 
istics  were  reduced  and  the  signals  appeared  more  random.  This  change  in  ran¬ 
domness  caused  the  modulus  of  the  bicohercnce  function,  defined  as  the  normalized 
bispectrum  with  respect  to  power  spectra,  to  decrease  significantly.  The  more  exact 
diagnosis  which  considered  the  nonstationary  properties  of  the  noises  was  left  as 
future  work,  and  an  experimental  design  strategy  with  an  associated  statistical 
classification  approach  also  was  not  evident  in  this  first  HOS  monitoring  approach. 
This  first  HOS  approach  also  investigated  severe  faults  rather  than  incipient  faults. 
Incipient  faults  are  those  failures  which  arc  just  beginning  to  appear  in  the  me¬ 
chanical  system. 

To  overcome  the  deficiencies  of  this  first  HOS  monitoring  study  this  re¬ 
search  developed  time  scries  estimation  procedures  based  on  a  nonstationary  or 
cumulant  spectrton  representation  of  the  stochastic  process  under  study.  An  exper¬ 
imental  design  strategy  and  statistical  pattern  recognition  framework  was  imple¬ 
mented  to  allow  strong  inferences  from  the  data  anaiysc,":  Furthermore,  the 
developed  HOS  approach  is  evaluated  for  its  abditv  to  rlctect  incipient,  rather  than 
severe,  faults,  fhus,  the  types  of  monitoring  problems  addressed  in  this  rc.scarch 
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are  more  difficult  than  those  previously  studied.  The  developed  HOS  approach 
combines  difTerent  forms  of  spectrum  measurements  (power,  second-order 
cumulant,  and  bispectrum)  from  sensors  in  a  statistical  classification  scheme  not 
only  to  improve  a  monitoring  system  s  classification  performance,  but  also  to  reduce 
its  sensitivity  to  variables  other  than  machine  condition.  These  variables  include, 
but  are  not  limited  to,  process  environment  parameters  such  as  workpiece  material 
construction,  cutting  conditions,  and  noi.se.  The  developed  HOS  approach  is  a  new 
type  of  sensor  fusion  technique  which  Dan  and  Mathew  (1990)  state  as  one  of  lue 
most  important  open  areas  in  condition  monitoring  research. 

1 .2  Problem  Statement  and  Scope 

The  goal  of  this  study  was  development  of  a  new  analytical  approach  for 
detecting  incipient  faults  in  physical  systems  which  have  a  periodic  driving  force 
mechanism  generating  potential  signature  data.  The  approach  is  the  first  to  incor¬ 
porate  nonatationary  (second-order  cumulant  spectrum)  in  addition  to  nonlinear 
(bispectrum)  and  linear  (power  spectrum)  characteristics  of  signature  time  .series  for 
use  as  feature  sets  to  improve  the  discriminatory  power  of  a  multivariate  classifier. 
■Signature  denotes  signal  patterns  which  characterize  a  specific  system  state. 

Investigation  of  hispectrum  analysis  as  a  fault  detection  approach  is  moti¬ 
vated  by  the  fact  that  fault  processes  of  rotating  mechanical  structures  are  known 
to  generate  highly  nonlinear  time  .series  data  through  the  generation  of  sum  and 
differf’nce  frequencies  (Braun,  1986).  Nonlinearity  is  a  result  of  intermodulation 
between  the  frequency  components  of  the  driving  process  and  produces  spectra  with 
sideband  structure.  Without  phase  information,  the  presence  of  nonlinearities  is  not 
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detectable.  The  bispectrum  captures  this  relative  phase  information  among  fre¬ 
quency  components.  Investigation  of  cumulant  spectrum  analysis  is  motivated  by 
the  fact  that  the  signal  data  generated  by  faults  in  physical  systems  under  study  is 
not  only  nonlinear,  but  also  nonstationary  due  to  the  modulation  effects  of  the 
random  fault  mechanisms. 

A  good  condition  monitoring  approach  is  insensitive  to  parameter  changes, 
noise  disturbances,  and  nonlinearities  which  are  intrinsic  to  the  random  processes 
under  study.  So  evaluation  of  the  developed  IIOS  approach  includes  marginal  and 
sensitivity  studies  of  both  simulated  and  actual  experimental  databases.  Marginal 
analyses  determined  the  incremental  value  of  IIOS  features  to  power  spectrum  fea¬ 
tures  for  discrimination  and  classification  tasks.  Sensitivity  analyses  determined  the 
impact  of  different  classification  algorithms,  stochastic  process  parameters,  and 
noise  on  classification  performance  of  classifiers  utilizing  spectral  feature  sets  with, 
and  without,  HOS  information. 

Simulation  experiments  of  modulated  signals  explored  potential  robustness 
properties  of  the  new  IIOS  approach.  Single  tone  amplitude  and  phase  modulation 
indices  of  a  cosine-wave  carrier  signal  (representing  the  periodic  driving  force  of  a 
rotating  machine  system)  and  standard  deviation  of  Gaussian  noise  are  the  simu¬ 
lation  parameters.  Incipient  faults  such  as  initial  wear  of  rotating  machinery  can 
appear  as  amplitude  and  phase  modulation  changes.  More  emphasis  is  directed  to 
changes  in  phase  modulation  as  amplitude  modulation  changes  are  assumed  related 
more  directly  to  deviations  in  the  process  environment  such  as  differences  in 
workpiece  properties  and  cutting  parameters  rather  than  slight  changes  in  process 
state.  Single  tone  modulation  and  the  values  chosen  for  the  modulation  index  pa¬ 
rameters  should  not  limit  the  applicability  of  the  simulation  study  results.  In- 
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creasing  the  complexity  of  the  signal  modulation  simulations  would  generate 
additional  frequency  interactions  and  modulations  and  consequently  provide  more 
frequency  support  in  each  of  the  higher-order  spectral  principal  domain  regions. 
Hence,  the  possibility  of  strengthening,  rather  than  weakening,  the  value  of  the 
nos  approach  is  afforded  by  increa.sing  the  complexity  of  the  modulation  simu¬ 
lation  experiments.  Analyses  of  simulated  data  provide  a  first  step  in  developing 
estimates  of  actual  classification  error  rates  and  also  allow  an  evaluation  of  the  im¬ 
pact  of  Gaussian  noise  on  classification  using  feature  sets  with  and  without  IIOS 
information. 

Since  not  much  condition  monitoring  research  addresses  high-speed  circuit 
card  drilling  of  epoxy-glass  composite,  IBM  (Austin)  conducted  an  experimental 
drill  wear  study.  Ramirez,  (1991)  discusses  the  IBM  circuit  card  manufacturing 
process  and  drilling  mechanics  which  generated  the  experimental  drill  wear  data. 
An  indirect  online  wear  monitoring  approach  using  drill  spindle  acceleration,  dis¬ 
placement,  and  speed  responses  was  investigated.  A  major  conclusion  of  the 
Ramirez  (1991)  study  was  particular  vibration  power  spectrum  harmonics  from  the 
thrust  axis  accelerometer  were  the  most  useful  responses  for  drill  wear  monitoring. 
Also,  since  circuit  card  material  composition  plays  a  key  role  in  generating  vi¬ 
bration,  variations  in  card  construction  can  mask  the  effects  of  wear  of  vibration 
power  spectra.  The  developed  1K)S  approach  is  investigated  for  its  potential  use 
in  the  industrial  environment  by  analyzing  IBM  experimental  drill  wear  data  of 
three  factors:  drill  bit  age.  circuit  card  stack  material,  and  chip  load  cutting  condi¬ 
tion.  Accelerometer  data  obtained  were  from  three  axial  positions  (.X.Y,  and  Z) 
gathered  on  two  types  of  bits  defined  by  their  number  of  circuit  card  holes  drilled 
(0  and  8000),  two  types  of  stack  materials  (NIP  and  6.S2P),  and  two  types  of  chip 
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load  (3  and  4  mil/rev).  Chip  load  is  the  amount  of  axial  distance  travelled  by  the 
drill  bit  tip  in  a  single  revolution  or  rotation.  Actual  wear  data  analyses  will  dem¬ 
onstrate  the  marginal  contribution  of  IIOS  features  to  power  spectrum  features  for 
detecting  incipietit  faults  of  manufacturing  drill  bits.  Actual  wear  data  analyses  will 
also  add  supportive  evidence  for  further  investigation  and  possible  implementation 
of  the  new  HOS  approach  in  an  industrial  environment. 

Both  simulated  and  actual  time  scries  data  represent  incipient  failure  con¬ 
ditions  rather  than  new  and  definitely  worn  conditions  of  a  rotating  machine  proc¬ 
ess.  Intuitively,  it  should  be  harder  to  detect  slight  or  moderate  wear  than  advanced 
wear  of  rotating  machinery.  This  is  logical  as  signals  used  to  characterize  advanced 
wear  are  usually  more  pronounced  than  those  signals  characterizing  slight  wear. 
Wear  condition  of  drill  bits  from  the  IBM  experimental  study  were  optically 
checked  under  a  microscope  to  accurately  classify  wear  states.  Because  time  series 
waveforms  arc  already  grouped  for  their  “similarity”,  cluster  analyses  arc  not 
needed.  .Simulated  and  actual  experimental  data  analyzed  in  this  study  arc  highly 
non-(raussian  and  nonlinear  based  on  the  Ilinich  (1982)  bispectrum  statistical  tests. 
Hence,  feature  extraction  rather  than  an  optimality  approach  (Shumway.  1982)  is 
the  technique  used  for  time  scries  rliscrimination  and  classification. 

Both  background  noise  and  signal  propagation  media  interfere  with  signa¬ 
ture  signals.  Although  there  is  some  determinism  due  to  the  rotating  machine's 
periodic  driving  force  mechanism,  each  of  the  signal  types  (noise,  propagation,  and 
signature)  is  characterized  by  an  clement  of  unpredictability.  Hence,  an  ensemble 
of  signals  for  different  states  of  cyclostationary  processes  arc  analyzed  to  ensure  an 
effective  study  of  alternative  classification  approaches.  Probability  of  false  alarm 
and  probability  of  detection  arc  the  main  performance  measures,  fhese  measures 
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are  averaged  over  the  signal  ensembles  to  decrease  the  variability  of  these  per¬ 
formance  estimates. 

1.3  General  Research  Approach  and  Presentation 

This  research  focused  on  the  development  of  two  new  methodologies: 
cumulant  spectrum  estimation  (second-order)  and  MOS  feature  extraction.  In 
Chapters  3  and  4,  the  new  methodological  developments  are  discussed  which  build 
upon  the  background  material  given  in  Chapter  2.  The  HOS  approach  developed 
in  this  work  is  tested  with  both  simulated  and  actual  physical  phenomena  to  inves¬ 
tigate  and  quantify  the  benefits  of  HOS  estimation  and  feature  extraction  for  in¬ 
cipient  fault  detection.  New  estimation  code  to  perform  second-order  and 
third-order  cumulant  estimation  of  time  series  is  developed.  So  besides  the 
bispectrum,  the  second-order  cumulant  spectrum  is  investigated  and  employed  in 
discrimination  and  classification  tasks.  Presentation  of  results  to  just  the  second- 
order  cumulant  spectrum  is  due  to  time  constraints  and  some  technical  problems. 
Because  a  large  number  of  measurements  result  from  the  spectral  transformations 
of  the  raw  time  scries,  a  IlOS  feature  extraction  algorithm  is  developed  to  combine 
the  most  useful  spectral  measurements  for  incipient  fault  identification.  Appropri¬ 
ate  measures  of  clTect’vcncs.s  to  evaluate  the  relative  merit  of  spectral  feature  sets, 
with  and  without  IlfXS  information,  arc  devised  for  both  simulated  and  actual  ex¬ 
perimental  studies.  These  measures  of  effectiveness  are  in  the  results  section  of 
Chapter  .5  afier  each  experiment  de.scription. 


Chapter  2 
Background 


2.1  Introduction 

Detailed  information  on  the  major  methodologies  investigated  to  develop 
a  new  analytical  approach  to  the  research  problem  is  given  in  this  chapter.  First  is 
a  description  of  existing  incipient  fault  detection  techniques  for  rotating  machinery 
using  vibration  signals.  Second  is  an  examination  of  the  statistical  theory  and 
models  which  permit  interpretation  of  multivariate  or  group  differences.  Some 
special  considerations  for  use  of  multivariate  approaches  for  time  series  discrimi¬ 
nation  and  classification  are  discussed.  Third,  different  types  of  stochastic  processes 
and  the  mathematical  functions  used  to  describe  them  are  defined.  Existing  theory 
related  to  higher-order  statistics  (HOS)  concludes  the  chapter. 

2.2  Existing  Incipient  Fault  Detection  Techniques 

Although  many  types  of  signals  are  used  for  diagnostic  monitoring  of  ro¬ 
tating  machinery,  there  arc  more  examples  of  the  demonstrated  use  and  success  of 
vibration  monitoring  for  significantly  reducing  the  cost  of  maintenance,  increasing 
reliability,  and  decreasing  the  probability  of  cata.strophic  failure  of  rotating  ma¬ 
chinery  (Braun,  1986,  and  Shives  and  Mertaugh,  1986).  One  success  is  TRACOR 
Applied  Science's  (Austin)  vibration  monitoring  program  for  the  United  States 
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Navy  to  improve  the  reliability  and  maintainability  of  the  rotating  machines  on 
their  TRIDENT  submarines  and  surface  ships  (Milner,  1990).  They  use  signature 
analysis  of  the  accelerometer  outputs,  a  common  vibration  monitoring  technique. 
Other  incipient  fault  detection  techniques  u.sing  vibration  signal  monitoring  include 
demodulation  of  high  frequency  acceleration  signals,  statistical  analysis  of  acceler¬ 
ation  amplitude,  process  modelling  or  parametric  approaches  such  as  auto¬ 
regressive  moving  average  (ARMA)  time  series  niodels,  phase-locked  processing, 
cepstrum  analysis,  transient  analysis,  Hilbert  transforms,  and  general  pattern  re¬ 
cognition.  These  major  techniques  are  summarized  for  a  general  understanding  of 
their  strengths  and  weaknesses.  Braun  (1986)  and  Shives  and  Mertaugh  (1986)  have 
complete  discussions  of  these  methods  including  schemes  that  combine  some  of 
them. 

Signature  analysis  of  acceleration  outputs  is  used  in  many  commercial  ap¬ 
plications  in  addition  to  TRACOR's  use  for  the  Navy.  Specific  topics  of  analysis 
bands,  resolution,  accelerometer  type  and  its  placement,  instrumentation,  and 
presentation  of  accelerometer  output  are  peculiar  to  the  particular  application. 
However,  a  common  thread  among  all  applications  is  the  reliance  on  association 
of  a  particular  failure  mode  with  features  of  the  vibration  power  spectrum.  Tones 
and  other  power  spectral  features  present  in  rotating  machinery  vibration  arc  gen¬ 
erally  due  to  predictable  causes.  There  are  many  published  relationships  of  faults 
versus  power  spectral  features  for  many  different  types  of  machines  and  their  com¬ 
ponents.  Braun  (1986)  contains  the  theory  and  applications  of  many  different 
methods  within  the  field  known  as  mechanical  signature  analysis.  Signature  analysis 
is  a  very  common  technique  as  it  has  general  applicability  and  proven  success  for 
a  large  variety  of  machine  types.  Also,  the  computation  of  the  spectral  amplitude 
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at  selected  frequencies  and  the  association  of  amplitude  increases  with  specific 
faults  is  a  necessary  first  step  of  several  other  techniques  (general  pattern  recogni¬ 
tion,  trend  analysis  and  ;  ocess  modelling  at  key  frequencies,  transient  and 
cepstrum  analysis). 

High  frequency  demodulation  of  acceleration  signals  extracts  relatively  low 
frequency  information  from  a  high  frequency  signal  that  has  been  amplitude  mod¬ 
ulated  by  a  mechanical  defect.  It  is  mostly  applied  for  bearing  fault  detection.  A' 
the  ,ery  early  stages  of  a  bearing  fault,  impulses  due  to  the  rolling  element  passing 
over  the  fault  will  be  very  short  in  duration  and  can  extend  as  high  as  300  kHz  (Bell 
et  al.,  198.5).  The  impulses  excite  resonant  modes  of  the  machine  and  the  envelope 
of  the  resulting  time  signai  is  the  amplitude  modulated  component  of  the  defect 
signal.  The  envelope  signal  will  contain  discrete  peaks  with  periodicities  deter¬ 
mined  by  the  input  rate  of  the  defect.  After  cfTcctively  bandpassing  the  signal, 
power  spectral  analysis  of  the  envelope  will  produce  a  harmonic  series  with  a  fun¬ 
damental  frequency  that  is  -  elated  to  the  bearing  frequencies.  Other  general  areas 
of  application  of  this  technique  include  fault  detection  of  gears  and  fluid  film 
bearings,  and  seal  rub  analysis  (Darlow  et  al.,  197.5  and  Drago,  1979).  Because  of 
the  high  frequency  range  used  -vith  li"';  'eti.u.quc,  there  is  a  high  defcei  signal-to- 
noise  ratio  which  is  often  stated  as  an  advantage.  However,  an  associated  disad¬ 
vantage  is  the  requirement  that  the  particular  frequency  within  the  high  range  must 
he  predetermined  before  filtering  and  demodulation  is  performed. 

Weighted  likelihood  ratio  processing  and  kurtosis  arc  two  statistical  tech¬ 
niques  used  to  process  amplitude  signals.  Weighted  likelihood  ratio  processing  is 
described  later  in  this  chapter  so  only  kurtosis  processing  is  described  here.  A 
“universal”  behavior  noticed  in  wear-induced  failures  is  that  locali/cd  defects  ap- 
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pear  first  and  distributed  types  of  defects  follow.  Hence,  induced  vibrations  often 
have  an  impulsive  character  with  the  appearance  of  a  localized  defect,  changing  to 
a  more  continuous  function  over  time.  The  sharp  peaks  at  the  onset  of  defects  af¬ 
fect  the  tails  of  a  probability  density  function  (pdf),  and  moments  of  the  distribution 
such  as  kurtosis  can  enhance  the  sensitivity  to  changes  occurring  at  the  pdf  tail. 
Kurtosis  is  the  normalized  fourth  moment  of  a  probability  density  function  and 
emphasizes  the  peakness  of  a  particular  signal  pattern.  Normalization  is  accom¬ 
plished  by  removing  the  mean  from  the  data  and  dividing  by  the  fourth  power  of 
the  standard  deviation.  Kurtosis  as  a  statistic  is  considered  as  an  indication  of 
fiaussian  versus  non-Gaussian  densities  as  it  is  equal  to  three  for  all  Gaussian 
densities.  One  example  of  the  practical  use  of  kurtosis  is  in  the  area  of  rolling  ele¬ 
ment  bearing  fault  condition  (Dyer  and  Stewart,  1978).  The  kurtosis  value  for  good 
bearings  followed  the  Gaussian  distribution  value  of  three  while  significantly  de¬ 
graded  bearings  had  large  variations  in  the  normalized  acceleration  distribution. 
1  hese  large  variations  led  to  kurtosis  values  significantly  different  from  three.  The 
authors  stated  more  tests  including  simulation  results  for  performance  evaluation 
are  needed  before  conclusive  remarks  can  be  made  on  kurtosis  as  a  fault  indicator. 

Process  models  arc  methods  of  detecting  changes  in  expected  waveform 
structure.  I  his  technique  generally  involves  mathematically  modelling  the  system 
outputs  to  determine  if  abnormalities  exist  in  the  signatures  by  statistically  com¬ 
paring  them  to  normal  model  output.  The  extraction  of  features  from  the 
parametric  spectrum  can  mimic  the  methods  applied  to  non-parametric  spectra. 
Another  feature  extraction  approach  is  directly  using  system  identified  parameters 
that  describe  the  data  (ie.  AR,  MA,  ARMA).  Glassification  of  automobile  engine 
fitults  in  a  production  assembly-line  using  a  nearest  neighbor  classifier  was  based 


on  this  latter  type  of  feature  extraction  approach  (Gersch,  1986).  The  Kullback- 
Leibler  measure  of  dissimilarity  was  employed  which  assumes  the  time  series  are 
Gaussian-distributed  (Kullback,  1959). 

An  approach  related  to  Kalman  filtering  methods  is  based  on  analysis  of 
residuals  after  fitting  of  the  parametric  model  to  data.  Variations  in  residual  mag¬ 
nitude,  or  statistical  distributions  different  from  normal  meaning  the  fitted  model 
is  no  longer  appropriate,  can  indicate  a  change  in  signal  patterns.  Specifically,  an 
approach  called  the  Dynamic  Data  Sy.stem  (DDS)  uses  operational  data  from  a 
mechanical  system  and  applies  ARMA  mathematical  models  to  extract  features 
from  the  data  with  a  high  degree  of  sensitivity.  The  DDS  model  is  combined  with 
statistical  quality  control  chart  concepts  to  monitor  for  abnormalities  with  a  very 
limited  amount  of  data  (Wu,  1977). 

Phase-locked  processing  describes  a  general  class  of  special  processing 
techniques  that  efficiently  extract  and  filter  periodic  signals.  Use  of  phase-locking 
gives  equivalent  results  in  both  time  and  frequency  domains.  This  technique  uses 
encoders  to  give  an  integer  number  of  pulses  per  revolution  (Braun  and  Seth,  1979). 
The  number  of  pulses  from  the  encoder  should  be  equal  to  two  to  the  power  of  the 
number  of  pulses  per  revolution  as  the  discrete  Fourier  transform  (DPT)  is  usually 
computed  with  a  Radix  2  fast  Fourier  transform.  The  rotationally  locked  compo¬ 
nents  are  located  at  multiple  points  in  the  DFT  indexed  hy  p  =  NjM  where  N  is 
the  number  of  pulses  analyzed  per  revolution  and  M  is  the  number  of  points  in  one 
period.  By  employing  a  filter  whose  respon.se  is  set  to  zero  for  all  p  ^  NjM  the  ex¬ 
traction  of  the  periodic  signal  from  additional  non-coherent  interferences  is 
achieved.  If  additional  signals  non-coherent  with  the  rotational  frequency  of  the 


machine  exist,  windowing  is  employed  to  minimize  errors  in  the  signal  extraction 
process  due  to  possible  leakage  problems. 

Cepstrum  analysis  is  used  in  echo  detection  and  deconvolution  problems. 
Braun  (1986)  has  a  detailed  discussion  of  the  use  and  problems  in  the  computation 
of  cepstra.  A  common  signal  processing  problem  is  the  analysis  of  signals  which 
are  composed  of  a  wavelet  and  one  or  more  echoes  which  may  overlap.  A  simple 
form  of  this  composite  signal  is  jf(/)  =  s(t)  +  aos{t  —  tn)-  Distortional  effects  such  as 
noise,  overlapping  of  echoes  and  the  wavelet,  and  different  transmission  paths  ob¬ 
scure  the  echo  arrival  time  and  basic  wavelet  shape.  The  signal  plus  echo  may  be 
modelled  as  the  convolution  of  s(t)  with  a  time  function  3{t)  +  aoS{t  —  to)  and  the 
separation  of  these  two  convolved  signals  is  performed  with  operations  in  the  power 
cepstrum  analysis.  For  example,  if  x(t)  =  s(0  x  then 

In  I  T(o»)  r  =  In  I  S((i})  |  ’  -f-  In  I  //(o))  1  ^  There  is  also  complex  cepstrum  analysis 

which  is  more  general  than  the  power  cepstrum  as  inverse  operations  can  recover 
the  original  time  signal.  Both  the  power  and  complex  cepstrum  methods  are  im¬ 
pacted  by  smoothness  and  bandwidth  of  the  wavelet.  Additive  noise  is  another 
major  degrading  influence  in  the  effectiveness  of  cepstral  methods.  A  wide  band¬ 
width  and  smooth  wavelet  spectrum  is  ncce.ssary  for  a  less  erratic  wavelet  cepstrum 
which  subsequently  helps  distinguish  echo  spikes  from  the  wavelet  cepstrum.  A 
majority  of  rotating  machinery  applications  using  cepstrum  analysis  are  on  gear 
faults  (Randall,  1982).  Generally,  it  has  been  determined  that  gears  in  good  con¬ 
dition  normally  contain  frequency  sidebands  of  nearly  constant  amplitude  over  time 
in  the  power  spectrum.  Changes  in  the  number  and  amplitude  of  the  sidebands  are 
proposed  as  indicative  of  a  deterioration  of  a  gear's  condition  and  the  cepstrum  is 
able  to  detect  this  change  with  an  increase  in  amplitude  of  a  single  line.  Thus,  an 


15 


advantage  of  using  a  cepstrum  approach  is  not  being  confounded  by  several  sets 
of  periodicities  in  the  power  spectrum  causing  dilTiculty  with  a  visual  interpretation 
of  the  data.  However,  Braun  (1986)  states  this  method  is  an  interesting  approach 
to  analysis  of  convolved  signals,  but  it  must  be  treated  with  caution  and  care  for  the 
interpretation  of  its  application  to  machinery  diagnostic  problems. 

Because  of  their  origins,  transient  signals  u.sually  have  different  durations, 
peak  amplitudes,  repetition  rates,  frequencies,  and  bandwidths.  Transient  signal 
detection  schemes  exploit  varying  degrees  of  a  priori  waveform  structural  informa¬ 
tion.  Most  transient  processors  perform  two  primary  functions:  event  capture  and 
transient  analysis  (Owsley  and  Quazi,  1970).  Rvent  capture  involves  continuous 
loop  data  recording  with  a  trigger  signal  that  causes  transfer  to  permanent  data 
storage.  The  trigger  signal  is  driven  by  a  simple  detector  of  energy  increases. 
Transient  analysis  depends  on  the  application  and  so  varies  significantly.  Fourier 
analysis  is  used  to  select  key  features  such  as  the  center  frequency  of  a  narrowband 
transient  and  its  bandwidth  to  classify  the  transient.  Other  extracted  features  in¬ 
clude  pulse  duration  and  repetition  rate  (Nolte,  1968). 

Hilbert  transforms  are  another  way  to  ca,sily  extract  envelope  information 
from  a  modulated  time  signal.  The  Hilbert  transform  differs  from  the  Fourier 
transform  because  it  leaves  the  .signal  in  its  original  domain.  It  shills  the  value  of 
a  time  signal  by  1/4  wavelength  or  a  90°  phase  shift  in  frequency  domain.  Bell  et 
al.  (1985)  use  the  Hilbert  transform  for  incipient  fault  detection  of  rolling  clement 
bearings. 

Many  examples  of  machinery  monitoring  systems  in  the  literature  can  be 
categorized  as  a  general  pattern  recognition  approach.  One  excellent  commercial 
example  is  the  statistically  based  system  developed  at  Oak  Ridge  National  Tabora- 
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tory  for  continuous,  on-line,  unattended  surveillance  of  dynamic  reactor  signals 
(Smith,  1983).  Their  monitoring  system  is  based  on  identification  of  changes  in  the 
power  spectrum  of  measured  variables  where  change  is  detected  by  using 
discriminant  functions  formulated  to  emphasize  relevant  features.  Discriminants 
were  constructed  to  detect  the  following:  (1)  a  fluctuation  in  the  integral  power  of 
the  spectrum;  (2)  spectral  shape  changes;  (3)  deviations  in  the  magnitude  of  indi¬ 
vidual  spectral  estimates  at  a  given  frequency;  and  (4)  shifts  in  the  frequency  of 
spectral  peaks.  Their  system,  typical  of  most  pattern  recognition  systems,  used 
classification  functions  based  on  Bayesian  estimation  decision  theory  preceded  by 
a  heuristic  feature  extraction  process  to  transform  the  raw  time  series  data.  What¬ 
ever  features  are  used,  the  determination  of  thresholds  is  usually  determined  by  ex¬ 
perience  where  monitoring  .systems  are  fine-tuned  as  more  information  on  the 
process  is  obtained.  Features  are  used  for  classification  purposes  and  their  statis¬ 
tical  properties  affect  monitoring  performance.  However,  few  references  or  studies 
describe  monitoring  systems  based  on  formal  statistical  aspects  because  of  the  dif¬ 
ficulty  of  acquiring  information  and  databases  from  sufficiently  large  sample  pop¬ 
ulations  (Paul,  1977).  Statistical  pattern  recognition  is  the  general  framework  of 
this  research  and  simulation  and  actual  time  series  databases  are  from  sufficiently 
large  sample  populations.  The  statistical  approach  employed  in  this  research  is 
unique  for  constructed  features  arc  not  restricted  to  power  spectrum  estimates,  but 
also  include  estimates  of  two  higher-order  spectrum  forms. 

2.3  Measuring  Differences  Among  Multivariate  Populations 


Since  the  developed  monitoring  approach  is  described  and  evaluated  from 
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formal  statistical  aspects,  this  background  section  first  discusses  the  types  of  anal¬ 
ysis  questions  that  arise  when  confronted  with  the  problem  of  measuring  difTcrences 
among  multivariate  populations.  Decision  theory  definitions  introduce  the  theore¬ 
tical  basis  underlying  discrimination  and  classification  tasks.  Estimation  of  class- 
conditional  probability  density  functions  (pdfs)  or  discriminant  functions  under 
various  levels  of  assumption  are  discussed  and  compared.  Performance  assessment 
issues  of  the  developed  feature  extraction  sets  input  to  multivariate  classifiers  using 
design  and  test  sets  are  discussed.  Mathematical  details  that  address  the  partition¬ 
ing  of  the  total  sample  variance,  a  fundamental  step  in  the  development  of  tech¬ 
niques  which  separate  multivariate  populations  and  statistical  considerations  in 
measuring  population  differences,  conclude  the  background  section. 

Several  analysis  questions  are  postulated  when  investigating  multivariate 
group  differences.  First,  are  the  groups  significantly  different  with  respect  to  their 
multivariate  descriptions?  This  is  a  multivariate  equivalent  to  the  sample 
(univariate)  t-test  on  population  means.  A  sample  mean  vector,  or  centroid,  for 
each  population  is  formed,  and  the  null  hypothesis  of  equal  population  centroids 
is  tested  using  Hotelling's  T^statistic,  or  equivalently,  Wilks'  A  statistic  when  con¬ 
sidering  only  two  groups.  If  more  than  two  groups  or  populations  are  involved, 
multivariate  analysis  of  variance  looks  for  differences  among  population  centroids. 
Second,  what  role  do  the  measurement  variables  play  in  separating  the  groups?  A 
discriminant  function  which  can  be  a  linear,  quadratic  or  some  other  transformation 
of  the  measured  variables  answers  this  question.  Its  evaluation  objective  is  to  yield 
similar  values  for  cases  from  the  same  population  and  different  values  for  cases 
from  different  populations.  lixamining  the  discriminant  function  provides  insight 
on  which  measurement  or  feature  variables  arc  most  important  in  separating  the 
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groups.  The  population  separation  problem  using  only  information  about  one  of 
the  variables  at  a  time  usually  is  not  very  efTicient  and  is  suboptimal.  For  example, 
two  individual  variables  may  not  be  good  discriminators  by  themselves,  but  when 
combined  they  may  he  highly  elTective.  Developing  a  discriminant  function  corre¬ 
sponds  to  the  search  for  a  vantage  point  which  provides  a  view  with  maximum 
group  or  population  separation.  This  underlies  the  motivation  for  performing  HOS 
estimation  and  feature  extraction  in  addition  to  traditional  power  spectrum  meth¬ 
ods.  It  is  conjectured  that  higher-order  forms  of  spectra  combined  with  power 
spectra  will  provide  a  better  vantage  point.  Moreover,  HOS  features  will  just  sim¬ 
ply  be  better  discriminators  than  power  spectrum  features.  Discriminant  functions 
can  be  constructed  using  stepwise  selection  of  variables  similar  to  stepwise  selection 
of  variables  used  in  multiple  regression.  When  there  arc  more  than  two  groups, 
multiple  discriminant  functions  can  be  developed  (beyond  this  study's  scope). 
Third,  if  responses  or  measurements  of  the  variables  are  known  for  a  new  observa¬ 
tion,  to  which  group  does  the  case  belong?  This  is  the  multivariate  classification 
problem  while  the  first  two  questions  concerned  multivariate  discrimination.  In 
many  applications  of  discriminant  analysis,  classification  is  the  major  objective. 
For  example,  if  there  is  a  description  of  new  drill  bits  and  slightly  used  drill  bits  in 
terms  of  spectral  features  calculated  at  times  of  different  wear  states,  these  spectral 
features  can  then  be  used  in  classification  rules  which  would  specify  whether  an¬ 
other  drill  bit  is  a  member  of  one  of  the  wear  categories.  Thus,  classification  rules 
are  developed  from  the  discriminant  functions. 

Consider  definitions  from  decision  theory  to  explain  the  basis  underlying 
discrimination  and  classification.  A  decision  rule  partitions  a  space  into  regions 
£2,,  i I,  .  .  N  where  N  is  the  number  of  classes.  An  object,  or  time  series,  is 
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classified  as  coming  from  class  fo»  if  its  corresponding  vector  representation,  x  lies 
in  region  Q*.  The  vector  representation  x  can  either  represent  direct  time  series 
measurements  or  features,  <i)  (X/),  which  are  functions  of  the  x,.  The  boundaries 
between  regions  are  called  decision  surfaces.  Assume  that  prior  probabilities, 
are  known  that  an  object  comes  from  class  ru,  (/  =  I, ... ,  N).  Information  in 
the  form  of  a  vector,  x,  is  then  determined  for  an  object  to  be  classified.  The  Bayes 
minimum  error  rule  is  formed  by  comparing  the  posterior  probabilities  of  belongin.g 
to  each  class  using  the  information  vector  and  classify  according  to  whichever  is 
larger; 


P  (a»fe  I  x)  >  /*  {(Oj  I  x)  for  all  j  k  -+  x  6  Q*. 

Since  the  posterior  probabilities  are  rarely  known,  they  need  to  be  estimated  from 
samples  of  known  classification.  Another  formulation  of  the  Bayes  minimum  error 
rule  is  obtained  through  application  of  Bayes  Theorem  to  determine  the  class 
membership  probabilities: 


P  {<o,  I  x) 


p  (x  I  Mi)P  (a>,) 
/’(x) 


which  results  with 

Z’ (x  I  fu;,)  r  (o)<,)  >  p  (x  I  fi>,) for  ally  A: -►  X  6  Qfe.  [1] 

I  f  p  (x  I  (O).  the  class-conditional  pdfs  arc  known,  the  problem  is  solved  by  substi¬ 
tution  of  X  into  [I]  for  the  time  scries  being  classified  and  finding  the  largest  value 
of  /?  (x  I  o),)  PUo,).  But  similar  to  P  ((o,  \  x).  the  p  (x  |  ro,)  arc  probably  unknown  and 
require  estimation  from  a  set  of  classified  samples. 
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Bayes  minimum  error  rule  for  the  two  case  situation  is: 

p(x|  tu,)  ^  P(a}2) 
p(x  I  a>2)  <  P((o,) 

This  rule  minimizes  the  overall  error  assuming  equal  misclassification  costs  but  for 
industrial  manufacturing  situations  where  misclassifying  a  worn  tool  may  be  more 
serious  than  misclassifying  a  new  tool,  a  different  criterion  which  considers  the  dif¬ 
ferent  misclassification  costs  (Bayes  minimum  risk)  may  be  more  appropriate.  Ad¬ 
ditionally,  if  the  prior  probabilities  of  a  new  time  series  are  unknown,  a  minimax 
rule  designed  to  minimize  the  maximum  possible  risk  is  used.  Hand  (1981)  develops 
ail  three  rules  expressed  as  functions  of  x  using  the  class-conditional  pdfs  p  (x  |  Wi). 
Considering  the  absolute  values  of  the  probabilities  not  as  relevant  as  their  relative 
magnitudes  allows  more  general  rules.  For  the  two-class  situation  the  general  rule 
is: 

/i(x)^  constant  ^  x  e 

where  h  is  called  a  discriminant  function.  As  before,  the  discriminant  function  will 
require  estimation  from  classified  samples.  Fstimation  procedures  are  categorized 
by  the  level  of  assumptions  used  for  the  likelihood  function;  parametric  and  non- 
parametric.  Non-paramctric  approaches  estimate  the  class-conditional  pdfs  or  the 
discriminant  functions  without  any  knowledge  about  their  parametric  form.  In 
parametric  approaches,  assumptions  arc  made  about  the  form  of  the  class- 
conditional  pdfs  or  discriminant  functions  and  estimation  of  the  unknown  func¬ 
tional  parameters  arc  performed  with  the  classified  samples.  Parametric  and 
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non-parametric  estimation  methods  applied  in  this  research  study  are  discussed 
next. 

2.3.1  Estimation  Methods 

If  the  class-conditional  pdfs  or  discriminant  functions  forms  are  known, 
then  the  tasks  of  discrimination  and  classification  are  simplified  as  likelihood  func¬ 
tion  ratios  with  various  risk  thresholds  are  compared  for  its  solution.  Unfortu¬ 
nately,  this  knowledge  rarely  exists  but  the  general  parametric  form  may  be  known 
from  some  theoretical  knowledge  or  from  a  study  of  the  sampling  distributions.  In 
this  situation,  samples  are  used  to  give  estimates  of  parameters  of  the  class- 
conditional  pdfs  or  more  generally,  sample  distributions  are  used  to  estimate  the 
parameters  of  the  discriminant  functions.  However,  when  simplifying  assumptions 
are  not  defendable,  non-parametric  methods  are  also  applied.  Lachenbruch  (1975) 
and  Hand  (1981)  outline  and  compare  various  parametric  and  non-parametric  pdf 
estimation  methods.  After  preliminary  experimentation  and  application  of  several 
of  these  estimation  methods,  three  were  chosen  for  their  consistent  classification 
performance  of  the  experimental  data  described  in  Chapter  5.  The  k-neare.st- 
neighbor  (k-NN)  method  was  the  non-paramclric  method  applied  to  actual  time 
scries  data.  Two  parametric  approaches,  linear  and  quadratic  discriminant  func¬ 
tions,  were  also  applied  to  the  actual  data.  Linear  discriminant  functions  were 
constructed  for  the  simulated  experiments. 

Assuming  a  multivariate  normal  distribution  for  the  spectral  features  of 
each  time  series  class  resulted  in  discriminant  rules  based  on  the  pooled  covariance 
matrix  (yielding  a  linear  function)  or  the  individual  within-group  covariance  matri- 


ces  (yielding  a  quadratic  function).  Feature  measurements  are  placed  in  the  class 
from  which  it  has  the  smallest  generalized  squared  distance  or  the  largest  posterior 
probability. 
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The  squared  distance  from  feature  vector  x  to  class  w  is 
4(x)  =  (x  -  xJ'V“'(x  -  X  J 

with  V„  being  the  pooled  or  wilhin-class  covariance  matrix  and  x„  being  the  feature 
variable  means  in  class  w.  The  class  specific  density  at  x  from  class  cd  is: 

fJS)  =  (2;r)-'’''|F„r'''exp(-.5rf„'(x)) 

and  posterior  probability  of  x  belonging  to  class  cu  is  computed  by  applying  Bayes 
Theorem: 


p{w  I  x) 


where  the  summation  is  over  all  the  classes. 

Now,  the  generalized  square  distance  from  x  to  class  w  is 

=  <„Yx)+^lM  +  g2M 


with  gi(fu)  =  log.|F,„|  if  within-class  covariances  are  used  (quadratic  function), 
g,(w)  =  0  if  pooled  covariance  matrix  is  used  (linear  function),  =  —  2  log,(<7,„) 
if  prior  probabilities  are  unequal,  and  gj((i>)  =  0  if  prior  probabilities  are  equal. 
The  posterior  probability  of  x  belonging  to  class  m  is  then 
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exp(  -.5DI(x)) 

-.5DI(x)) 

Q 

Thus,  an  observation,  or  feature  variable  set,  is  classified  into  class  i2  if  setting  w 
equal  to  Q  produces  the  largest  posterior  probability  or  smallest  value  of  DJ{x). 
The  difference  between  the  generalized  squared  distances  of  the  class  means  is  the 
squared  Mahalanobis  distance  measure. 

Non-parametric  estimates  of  class-specific  probability  densities  of  feature 
sets  are  computed  with  a  k-nearest  neighbor  approach.  Squared  Mahalanobis  dis¬ 
tance  calculated  from  the  pooled  covariance  matrices  is  used  to  determine  proxim¬ 
ity.  The  k-nearest  neighbor  does  not  have  a  complicated  approach  to  its  selection 
of  the  smoothing  parameter,  k,  as  it  is  based  on  which  gives  the  best  classification 
performance.  Following  Hand  (1981),  consider  the  probability  that  a  point  will  fall 
in  a  local  neighborhood  L  of  x  for  the  multivariate  pdf  p  (x  |  a)„)  as 


0  =  h’  (y  I  Wm)  d  y. 

’’L 

The  following  approximation  is  made  iff  is  small  and  has  volume  V: 

0^p{x\  (o„) .  V 


which  yields 


p{x\coJ^0IV. 
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A  pdf  estimator  for  6  is  then  computed  by  the  proportion  of  the  -ample  points 
falling  in  the  local  neighborhood  L.  Assign  k  to  denote  the  number  of  sample 
points  falling  in  and  obtain  d  =  kln„  which  then  leads  to  the  estimator  defined 
as; 


p(x\coJ  =  — . 

m 

The  volume  V  is  made  dependent  on  the  data  by  fixing  k  and  determining  V  needed 
to  enclose  the  k  nearest  points  to  x.  Next,  combine  all  the  classes'  sample  points 
into  one  set  of  n  points  such  that  X”-  =  The  hypersphere  of  volume  V  which 

m 

just  encloses  k  points  from  this  combined  set  is  found.  Now  consider  that  among 
the  k  points,  occur  from  class  a>„.  Thus,  a  k-NN  estimator  for  class  ru,  is  de¬ 
fined: 


P  (x  1  wj 


k 


m 


V  ■ 


^  n  k 

There  are  also  the  estimators  ^  and  p  (x)  =  — Application  of 

^  nV 

Bayes  Theorem  gives; 


P  {a)„  I  x) 


p  (x  I  (Ojp  {m„) 
P  (x) 


k 


nV 


So,  the  following  clas.sification  rule  is  generated:  classify  x  as  belonging  to  class  i  if 
k,  =  maxj/c,). 
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2.3. 1.1  Advantages  and  Disadvantages  of  Applied  Estimation  Methods 

A  disadvantage  of  k-nearest  neighbor  is  distance"  '''om  the  feature  vector 
to  all  of  the  sample  points  must  be  determined.  Hence,  all  of  the  sample  points 
must  be  retained  and  this  can  increase  the  amount  of  computer  time  for  classifica¬ 
tion.  However,  theie  are  branch  and  bound  techniques  to  reduce  the  amount  of 
data  required  so  quicker  computation  is  possible  (Hand.  1981).  It  also  has  the 
theoretical  disadvantage  of  not  being  a  pdf  (Hand,  1981).  Assuming  parametric 
forms  for  the  class-conditional  pdfs  allows  quicker  classifications  of  new  samples 
and  no  large  databases  of  training  set  points  are  necessary  to  retain.  However,  an 
incorrect  distributional  assumption  will  incur  an  associated  cost  in  terms  of  an  in¬ 
creased  misclassification  error  rate,  bvU  this  cost  may  he  acceptable  if  computa¬ 
tional  advantages  outweigh  it. 

2.3.2  Other  Estimation  Approaches  Considered 

Several  other  estimation  approaches  investigated  during  the  study  were 
weighted  likelihood  ratio  and  logistic  di.scrimination.  These  methods  were  not  used 
to  generate  final  study  results  for  their  results  vs'ere  not  as  good  as  the  others. 
However,  weighted  likelihood  ratio  processing  is  described  here  as  it  was  applied  to 
data  proposed  as  a  future  application  for  the  developed  HOS  approach.  Logistic 
discrimination  is  described  because  of  its  similarity  to  the  linear  discriminant  ap¬ 
proach. 

Milner  (1988)  found  that  a  likelihood  ratio  weighting  technique  of  vibration 
power  spectra  to  be  superior  in  detection  performance  for  a  wide  range  of  probicm.s 
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in  pump  and  fan  data.  This  approach  assumes  a  Gaussian  density  function  of  the 
logarithmic  amplitude  of  the  power  sp3ctrum.  The  binary  test  hypotheses,  (a) 
power  spectrum  indicates  new  object  (K,),  or  (b)  power  spectrum  indicates  slightly 
used  object  (ATn),  alter  this  Gaussian  density  in  mean  level  only,  and  the  power  es¬ 
timates  of  each  bin  or  frequency  are  assumed  independent.  Given  the  definition  of 
the  natural  logarithm  of  the  likelihood  ratio  derived  from  the  Bayes  criterion  for 
binary  hypotheses,  the  log  likelihood  ratio  is: 


In  (Si) 


J_ 

2 


[2] 


where  S,  is  amplitude  of  the  object  vibration  power  spectrum  in  decibels  (dB)  at 
frequency  i,  w,,  is  average  log  amplitude  of  frequency  i  of  new  object  vibration 
power  spectrum,  Wo*  is  average  log  amplitude  of  frequency  i  of  slightly  used  object 
vibration  power  spectrum,  and  M  is  the  number  of  useful  frequency  tones.  Com¬ 
pleting  the  square  and  canceling  common  terms  [2]  becomes: 


In  ( ) 


M 


/=  I 


[3] 


If  [.1]  is  greater  than  zero,  the  object  is  classified  as  slightly  used.  If  f.l]  is  less  than 
or  equal  to  zero,  then  the  object  is  classified  as  new.  Implementation  of  this  test 
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First  computes  power  spectra  for  the  set  of  new  objects  and  for  the  set  of  slightly 
used  objects.  Then  values  for  m,i  and  mo,  are  computed,  and  the  bins  with  the 
largest  mean  shift  as  compared  to  their  stability  receive  the  largest  weights.  These 
weights  are  consequently  indicators  of  the  relative  importance  of  specific  frequen¬ 
cies  as  indicators  of  object  wear.  A  weighted  sum  over  all  useful  frequency  infor¬ 
mation  for  the  particular  time  series  is  computed  to  detect  the  worn  ^ci.dition. 

Weaknesses  of  this  approach  outweighed  any  advantage  of  incorporating 
global  spectral  characteristics.  The  weaknesses  are  assuming  the  class  distributions 
are  diftcrent  in  mean  level  only,  and  that  power  estimates  at  each  frequency  are 
independent  so  a  diagonal  covariance  matrix  can  be  used.  These  assumptions  were 
not  appropriate  for  the  actual  experimental  data  analyzed  in  this  study. 

Logistic  discrimination  is  a  partial  distribution  classification  method  as  it 
assumes  the  log-likelihood  ratio  is  linear  in  the  measured  parameter  vectors: 


In 


Po  + 


[4] 


where  /?'  =  (/?i, ...  ,/?,).  Anderson  and  Richardson  (1979)  show  three  advantages 
for  using  logistic  discrimination  versus  a  fully  distributional  or  distribution-free 
classification  approach.  First,  the  model  given  at  [4]  gives  a  simple  form  for  the 
posterior  probabilities: 

pq  -t  In  C-f  P'\ 


(k  I  x)  =  exp 


( 1  +  cxp(/?n'  +  In  C  +  P'\))  ’ 
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where  C  =  fli/rij,  and  H,  is  the  proportion  of  sample  from  K,  with  {s=  1,2).  Sec¬ 
ond,  once  the  parameters  {Pb',P  and  C)  are  estimated,  the  allocation  of  a  new  ob¬ 
servation  or  feature  vector  set  requires  only  a  linear  function  calculation; 

/?o'  4-  C  -1-  fi'x. 

Third,  this  same  estimation  procedure  is  applicable  with  either  continuous  or  dis¬ 
crete  predictor  variables. 

2.3.3  Mathematical  Development  of  Analysi.s  of  Variance 

A  fundamental  step  in  the  development  of  statistical  techniques  based  on 
separation  of  multivariate  populations  is  the  partitioning  of  the  total  sample  vari¬ 
ance  into  components  representing  within  class  variation  (variance  of  individual 
observations  about  their  class's  centroid),  and  among  class  variation  (variance  of 
individual  observations  around  the  centroid  for  the  combined  sample).  This  parti¬ 
tioning  process  is  the  multivariate  equivalent  of  the  partitioning  sum-of-squares 
accomplished  in  the  univariate  analysis  of  variance  (ANOVA)  model.  In  univariate 
problems,  hypotheses  concerning  equality  of  means  can  be  tested  using  the  two 
sample  t-test  when  two  groups  are  involved,  or  F-tests  using  statistics  derived  from 
one-way  ANOVA  when  multiple  groups  are  considered.  In  multivariate  analysis, 
equality  of  mean  vectors  or  centroids  across  groups  or  populations  are  tested.  For 
the  two  population  case.  Hotelling's  /''provides  the  multivariate  equivalent  to  the 
two  sample  t-test  and  test  statistics  derived  from  one-way  multivariate  analysis  of 
variance  (MANOVA)  provide  the  appropriate  hypothesis  tests  for  the  multiple 
population  situation.  When  there  arc  only  twm  groups  or  classes  as  in  this  research 
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study,  the  one-way  MANOVA  is  equivalent  to  the  two-sample  Hotelling's  7  '  test 
and  this  is  the  presented  approach. 

2.3.3.!  Partitioning  of  Variance 

Consider  developed  expressions  representing  the  partitioning  of  an  arbi¬ 
trary  linear  combination  of  measurement  variables  into  within  and  among  class 
components.  Notation  is  defined; 

jr,y*  /  =  1,2, j  ~  \,2, ...  ,g  represents  the  observed  value  on  the  j"’ 
variable  from  the  i"'  case  in  the  k""  class.  There  are  g  groups  or  classes  and  p 
measured  variables.  Class  k  includes  n^  observations. 

&k  -  [.>c,u  ...  x,,*]'  is  a  p-elemenl  column  vector  representing  the  complete 
multivariate  observation  for  the  /'*  sample  in  the  k"'  class. 

£k  —  Ljfi*  is  a  p-element  column  vector  representing  the  centroid 

of  the  k'*  class. 

The  elements  of  x*  are  the  sample  means  for  each  variable  computed  for 

observations  in  the  k"'  class,  and  denoted  x,k  ',  j  —  I,  2, ...  ,  p. 

-  I  ’  _ 

=  {'jp)J^nk  Xt  is  a  p-elcment  column  vector  rcpre.senting  the  combined 

*  =  I 

class  centroid. 

The  elements  of  x  arc  the  sample  means  of  each  variable  computed  for 

* 

observations  from  all  g  classes,  and  n  is  the  total  sample  size:  n  —  ^n^. 

k  I 

Consider  a  special  vector  representation  of  the  matrix  of  sums-of-squarcs 
and  crossproducts  of  deviations  from  the  mean.  This  matrix,  when  divided  by  (n-1) 
for  n  total  observations,  is  the  sample  covariance  matrix.  Also  consider  the  fol- 
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lowing  product  of  the  vector  of  deviations  from  the  combined  class  centroid  for  a 
specific  observation  and  its  transpose: 

^nk  ~  ^2  _  _  _ 

(£ik  ~  £)(£lk  ~  i)  ~  ””  ■^2)'  '(^ipk  ~~ 

^Ipk  -  ^p 

(-f/1  fc  -  ^1  f  {Xnk-  ^1  ){xnk  -x-i)  ...  (jr„  *  -  J,  )(x^*  -  x^) 

=  (Xi2k-X2f...  {Xi2k~X2)(x,p„~Xp)  .  [5] 

{X/pk  Xp) 

The  matrix  which  results  from  this  multiplication  of  a  column  times  a  row  vector 
is  a  p  X  p  matrix  of  squares  and  crossproducts  of  the  deviations  of  the  observation 
for  each  variable  from  the  corresponding  sample  mean.  If  these  vector  products 
are  calculated  for  each  observation  in  the  A'*  class  and  the  results  are  summed,  the 
following  square  matrix  will  result: 


'^{Xik  -  £)(£ik  -  i)' 
i=  1 
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Summing  [6]  across  g  classes  results  in  another  symmetric  matrix  which  looks  like 
[6]  except  for  the  double  summations  which  accumulate  results  for  all  observations 
across  the  g  classes  has  the  sums-of-squared  deviations  of  each  variable  from  its 
mean  on  the  diagonal.  The  off-diagonal  elements  are  sums  of  crossproducts  of  de¬ 
viations  from  the  mean  for  all  pairs  of  variables.  If  this  matrix  is  scalar  multiplied 
by  l/(rt  —  1),  the  sample  covariance  matrix  for  the  combined  class  {total  covariance 
matrix)  is  obtained.  The  summation  of  [6]  acro.ss  all  classes  is  the  total  .sums-of- 
squares  and  crossproducts,  and  is  denoted  by  T; 

7' =  ^  [7] 

<:=  I  /=  I 

The  total  covariance  matrix  is: 

g 

^  ’  fe  =  I 

A  similar  computation,  performed  by  substituting  the  centroid  for  the  k"' 
class  for  the  overall  centroid  and  summing  only  over  the  observation  subscript  (i), 
yields  the  within  class  sums-of-squarcs  and  cros.sproducts  matrix  denoted  by  IT",  for 

/=  1 


the  k"‘  class: 


If  If'*  is  scalar  multiplied  by  !/(«*  —  1),  the  sample  covariance  matrix  for  the  k""  class 
is  obtained. 

For  the  discriminant  analysis  problem,  the  T  matrix  is  partitioned  into 
matrices  attributable  to  the  within  class  (W)  and  among  class  (A)  differences.  This 
partitioning  process  is  analogous  to  the  partitioning  of  the  total  sum-of-squares  in 
the  univariate  ANOVA  model  except  this  is  working  with  vector  rather  than  scalar 
quantities.  If  the  W  matrix  is  multiplied  by  l/XC''*  ~  g),  result  is  a 

pooled  estimate  of  the  covariance  matrix  or  within  class  covariance  matrix 
(multivariate  analogy  of  the  pooled  estimate  of  variance  used  in  univariate  two- 
sample  t-test  and  the  pooled  estimate  of  the  error  variance  used  in  univariate 
ANOVA).  This  matrix  is  denoted  by  5.  where  S,  =  —  1  -  W. 

There  arc  different  ways  to  manipulate  combinations  of  time  scries  meas¬ 
urement  or  feature  deviations  from  their  centroids  in  the  development  of 
discriminant  functions  and  statistical  tests  for  centroid  differences.  In  a  discussion 
rcsinctcr  to  only  linear  discriminant  functions,  g,  computational  advantages  will 
result  from  a  g  that  is  linear  in  the  components  of  the  observation  measurements 
jr,  or  features  which  are  functions  of  the  x.  Considering  only  manipulating  linear 
combinations  of  feature  vector  deviations  from  their  centroids,  scores  like  the  fol¬ 
lowing  arc  calculated: 

p 

fik  ^  ~ 

>=  I 

where  a,  is  a  coefficient,  i  represents  an  observation  index  or  time  scries,  and  j  is 
a  feature  variable  index.  In  matrix  terms,  flO]  is: 
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fik  =  a'Uik  -  JC)  where  g  = 


[II] 


The  partitioning  of  the  total  sum-of-squares  of  the /*  scores  into  within  and 
among  class  components  is  required  for  developing  the  discriminant  function.  Since 


k=\ 1=1 


-  a  ’Ta 

and  recalling  the  partitioning  of  the  T  matrix: 


[12] 


a'Ta  =  a''^IV^a  +  a  Ag  -  g'fVa  +  g'Ag.  [13] 

(=  I 

[13]  is  an  equation  with  scalar  terms  as  pre  and  post  multiplication  of  the  p  x  p 
matrices  by  g'  and  g  results  in  I  x  I  matrix  products.  Pre  and  post  multiplication 
by  g  results  with  the  first  term  of  [13]  representing  the  sum-of-square  values  of  the 
linear  function  defined  by  the  cocfTicicnts  in  g  evaluated  for  deviations  of  each  fea¬ 
ture  vector  observation  from  its  class  mean.  The  final  term  of  [13]  is  a  weighted 
sum-of-sqiiared  values  of  the  linear  function  evaluated  for  deviations  of  class 
centroids  from  the  combined  class  centroid.  Thus,  [13]  is  used  to  partition  the  total 
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variance  in  discriminant  function  scores  into  the  between  and  within  groups  com¬ 
ponents  as  c!  represents  the  within  group  sum-of-squares,  aAa  the  among  group 
sum-of-squares,  and  a'Ta  the  total  sum-of-squares.  Considering  a  as  a  vector  of 
discriminant  function  coefficients  obtained  using  the  Lagrange  multiplier  solution 
technique  to  the  maximization  problem: 


Find 


a' Aa 


a'fFg 

Subject  to  :a'fVg  =  n 


g. 


also  obtains  g'Ag  =  —  ^).  The  restriction  a'  =  «  —  g  imposed  to  finding  the 

optimal  discriminant  function  allows  [13]  to  be  rewritten: 

a'  Tg  =  {n-g)+  Hn-  g)  =  {n-g)(l  -I-  /!).  [14] 

Since  a'Ag  =  A{n  —  g)  is  the  among  group  sum-of-squares,  or  the  group  separation 
“explained"  by  the  specific  discriminant  function  which  a  defines,  a  reasonable 
measure  of  the  power  of  this  discriminant  function  is  the  fraction  of  sum-of-squares 
“explained”: 


Hn  -  g)  _  A 
(1  +  ;,)(« -g)  ■  fl  +  A)  ■ 


[LS] 


fhe  square  root  of  [1.5]  is  the  canonical  correlation  coefficient  and  is  an  indicator 
of  the  power  of  a  specific  discriminating  function.  Another  evaluation  of 
discriminant  function  cfTectivcness  can  be  obtained  by  examining  its  statistical  sig¬ 
nificance.  Tests  for  significance  based  on  properties  of  the  within  and  among 
groups  matrices  is  examined  next. 
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2.3.3.2  One-Way  ANOVA  and  MANOVA 

Consider  the  univariate  one-way  ANOVA  model: 

[16] 

where  xn  is  the  observed  value  of  an  interval  sealed  criterion  variable,  is  the 
overall  mean,  a*  is  an  effect  due  to  the  presence  of  the  k""  treatment  or  experimental 
condition,  and  is  an  error  term  analogous  to  the  e,  term  in  the  multiple  regression 
model.  In  this  model,  i  indexes  a  specific  observation  in  one  of  the  g  groups  of 
observations  collected  using  the  various  experimental  conditions.  By  letting 
Hi  =  n  +  a»,  where  represents  the  /c'*  group  mean,  [16]  is 


Xik  =  +  [17] 

If  the  random  errors  are  independent  normally  distributed  with  common  variance, 
statistical  tests  for  the  significance  of  the  ot,..s  are  performed.  The  hypothesis  tested 
is: 


77o  :  Ml  =  /^2  =  ■  = 

//|  :  at  least  two  differ. 

The  significance  of  the  differences  among  group  means  is  interpreted  by 
partitioning  the  total  sum-of-squares  of  .r,»  deviations  from  their  sample  mean  into 
the  within  groups  component  which  represents  an  estimate  of  error  variance,  and 
the  among  groups  component  which  measures  deviations  from  the  null  hypothesis 
of  no  group  clifTcrcnccs.  i.et  represent  the  sample  mean  for  the  k"'  group,  and  x 
the  overall  sample  mean.  The  partitioning  is  then: 
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fc  =  I  1  =  1 


g 

ll 

k=\ i= 1 


[(Jf/*  -  ^k)  +  (-*■*  - 


which  equals: 


g  ''k  S  "*  g  ’'k 

ll 

*=i/=i  fe=i/=i  *=11=1 

where  the  term  on  the  LHS  of  [18]  stands  for  the  total  sum-of-squares  (S5r),  the 
first  term  of  the  RflS  of  [18]  is  the  within  groups  sum-of-squares  (SSw),  and  the 
second  term  of  the  RHS  of  [18]  is  the  among  groups  sum-of-squares  (SSa)-  If  the 
null  hypothesis  of  no  group  mean  difierences  is  true,  the  among  group  sum-of- 
squares  should  be  very  small  with  most  variation  due  to  the  within  groups  compo¬ 
nent.  If  the  error  terms,  r,*  arc  independent  and  normally  distributed  with  common 
variance  and  zero  mean,  the  following  .statistic  tests  the  group  dincrence  hypothesis: 

^  SSJ{g-l) 

"  SS^.I{n-g)  ■ 

The  statistic  F,  is  distributed  as  an  F-statistic  with  g-l  and  n-g  degrees  of  freedom. 
Targe  values  of  F„  lead  to  the  rejection  of  the  hypothesis  that  all  group  means  are 
identical. 


{xik  -  xf 


ll 


xf 


[18] 


In  the  multivariate  equivalent  to  the  one-way  ANOVA,  scalar  elements  arc 
simply  replaced  by  vectors  so  **  is  a  p-elcment  observation  vector  with  elements 
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x,n,  ^  is  a  p-element  centroid  for  the  overall  population  centroid,  a*  is  the  effect  of 
the  A'*  treatment,  and  gy*  is  an  error  vector.  Hence,  the  tested  hypothesis  is: 

Ho  :  if!  =  =  -—/if 

//,  :  at  least  two  differ. 

Rejecting  the  null  hypothesis  leads  to  the  conclusion  that  there  is  a  difference 
among  some  of  the  group  centroids. 

Similar  to  the  univariate  case,  the  significance  of  the  difference  among 
centroids  is  investigated  by  partitioning  the  sum-of-squared  deviations  of  the  ob¬ 
servation  vectors,  **,  from  the  combined  sample  group  centroid  denoted  by  a.  This 
is  the  same  problem  where  the  partitioning  process  results  in  expressing  the  total 
sum-of-squares  and  crossproducts  matrix,  T,  as  the  sum  of  within  and  among 
groups  components,  W  and  A,  or  T  =  IV  +  A.  If  the  null  hypothesis  is  true,  matrix 
W  will  be  similar  to  matrix  T.  The  evaluation  of  the  relative  magnitude  of  within 
and  among  groups  sum-of-squares  is  complicated  by  the  fact  that  they  are  p  x  p 
matrices  However,  Wilks  (1963)  developed  a  test  based  on  the  determinants  of  the 
W  and  T  matrices.  His  procedure  represents  a  likelihood  ratio  test  of  the  hypoth¬ 
esis  that  all  groups  have  identical  centroids  and  the  Wilks'  lambda  statistic,  A,  is: 

A  =  ^  llTl 

\A  +  IV]  |7| 

1  hus,  it  is  seen  that  small  values  of  A  lead  to  rejection  of  the  null  hypothesis  of  no 
group  centroid  differences.  I  he  sampling  distribution  of  A  is  complex  because  the 
number  of  groups  (g),  ob.scrvations  (n).  and  variables  (p),  arc  all  parameters,  but 


various  approximations  for  evaluating  A  are  available.  One  is  the  F-statistic  for  the 
one-way  MANOVA  model  developed  by  Rao  (Tatsuoka,  PTl): 
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=  1- 


A 


l/r 


.1/1 


ms -pig  -  l)/2  + 


1 


pig  -  1) 


where  p  is  the  number  of  variables,  g  is  the  number  of  groups  or  treatments, 
m  —  n  —  I  -  (p  ^  g)j2  and 


5 


P\g~  1)^ 


_ 4 _ 

P^  +  (g-  l)'-5 


where  s  =  1  if  the  numerator  and  denominator  equals  zero.  The  statistic  F„  is  dis¬ 
tributed  as  where  v,  =  p(g  —  1)  and  v,  =  ms-  p{g  -  I)/2  -I-  I.  The  critical 

region  for  the  test  is: 


Reject  //„  :  =  =  pj, 

if  F„>  where  a  is  the  significance  level  for  the  test,  and  v,  and  Vj  are  the  nu¬ 
merator  and  denominator  degrees  of  freedom,  respectively. 

2.3.4  Clas.sifier  Performance  Measurement  Criteria 

A  critical  aspect  for  comparing  alternative  feature  extraction  approaches 
input  to  several  classification  algorithms  is  a  fair  and  consistent  estimation  of  their 
total  misclassificrtion  erro”  rates.  Lachcnbruch  (197,'')  discusses  the  Icaving-one- 
out,  OT  jarknife,  method  for  computing  error  rate  estimates.  This  method  estimates 
tlie  discriminant  rule  omitting  one  sample  time  scries  and  then  applies  that  rule  to 
classify  the  remaining  observational  time  scries.  Misclassifications  arc  tallied  aficr 
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the  jacknife  process  is  iteratively  done  for  all  observations.  The  various  types  of 
error  rates  are  discussed  in  the  next  section.  Another  measure  of  classifier  per¬ 
formance  is  the  total  cost  of  misclassification.  Stated  ''rom  a  decision  theory  per¬ 
spective,  a  l  ype  1  error,  or  lower  probability  of  detection,  is  worse  than  a  Type  11 
error,  or  higher  probability  of  false  alarm.  An  actual  total  classification  cost  can 
be  computed  for  alternative  classification  approaches  if  the  respective  misclassi¬ 
fication  costs  are  known. 

The  easiest  way  of  estimating  a  classifier's  misclassification  rate  is  calcu¬ 
lating  low  many  of  the  design  or  training  set  observations  are  misclassificd.  Fiarly 
work  in  pattern  recognition  research  implemented  this  approach  but  it  was  discov¬ 
ered  that  the  estimated  error  rates  underestimated  the  actual  error  rate  of  the 
classifier.  This  is  because  the  classifier  or  decision  rule  is  optimized  on  the  design 
set  and  unless  this  set  perfectly  represents  the  population  distribution,  a  new  set. 
which  is  random  sample  drawn  Irom  the  .same  distribution,  will  be  different  and  so 
the  classifier  will  not  be  optimal.  The  error  rate  calculated  by  reclassifying  the  de¬ 
sign  set  is  called  the  apparent  error  rate.  This  is  distinguishaole  from  the  actual  error 
rate  which  is  the  expected  error  rate  of  ..be  constructed  classifier  on  future  samples 
from  the  same  population  distribution  as  the  training  set. 

The  simple  approach  of  reclassifying  the  design  set  has  an  optimistic  bias 
so  researchers  investigated  methods  for  estimating  the  actual  error  rate  from  a  de¬ 
sign  set  (Hills,  19dd).  However,  problems  with  still  optimistically  biasing  the  esti¬ 
mation  of  actual  error  rate  caused  rc.scarchcrs  to  concentrate  on  estimating  the 
expected  actual  error  rate.  One  way  to  estimate  the  expected  actual  rate,  the 
Icaving-onc-out  method,  gives  an  unbiased  estimate  and  works  weP  with  non- 
(jaussian  observations  (Laclicnbruch  197.5).  I'his  is  the  method  for  computing 
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classification  error  rate  estimates  for  the  experimental  data  in  this  research.  But 
another  research  strategy  was  not  to  restrict  feature  performance  comparisons  by 
constructing  a  classifier  and  estimating  its  error  rate  from  only  a  design  set  of  data. 
In  simulation  experiments,  other  independent  time  .series  were  generated  and  used 
as  test  sets  so  application  of  a  designed  classifier  to  these  new  sets  yielded  a 
straightforward  estimate  of  the  actual  error  rates  for  the  various  feature 
extraction/classification  approaches.  Special  considerations  are  now  discus.sed 
about  discrimination  and  classification  of  data  that  are  in  time  series  form. 

2.4  Discrimination  and  Classifleation  of  Time  Series 

There  are  two  major  approaches  for  viewing,  analyzing,  and  interpreting 
time  series-one  based  on  the  time  domain  and  another  based  on  the  frequency  or 
spectral  domain.  The  theoretical  development  of  time  .series  methodology  has  ex¬ 
hibited  a  leader-follower  pattern,  first  empha.sizing  one  domain,  then  the  other. 
Spectral  (Fourier)  analysis  decomposes  functions  representing  fluctuating  phenom¬ 
ena  in  space  or  time  into  sinusoidal  components  that  have  varying  frequencies, 
amplitudes,  and  phases.  Several  texts  specify  the  necessary  mathematical  condi¬ 
tions  for  the  existence  of  the  F'ourier  transform  (Brigham,  1974  and  Braccwcll, 
1978).  Here,  let  it  suffice  that  the  I'ouricr  transform  docs  exist  for  waveforms 
physically  observed  in  nature  (Braccwcll,  1978).  For  a  given  random  process, 
X  =  {.»:(/),  /  6  R},  the  continuous  Fourier  transform  (C'FT)  definition  X{f),  is; 
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and  the  inverse  Fourier  transform  definition  jr  (r)  is: 

oo 

•oo 

The  inverse  Fourier  transform  shows  how  the  x  random  time  function  can  be  de¬ 
scribed  by  a  superposition  of  complex  sinusoids  with  the  amplitude  and  phase 
of  those  sinusoids  lying  in  the  spectral  band  between  /  and /+  df  defined  by 
X  {f)df.  Hence,  X  (/)  is  a  complex  amplitude  spectral  density  function.  For  ex¬ 
ample,  if  X  has  the  dimensions  of  volts,  then  X  (/)  has  the  dimensions  of  voIts/Hz. 
In  addition  to  the  Fourier  transform  being  a  function  that  represents  the  amplitude 
and  phase  at  each  frequency,  it  is  an  effective  tool  mathematically,  statistically,  and 
computationally.  It  is  of  great  mathematical  use  because  the  convolution  operation 
occurs  so  often  and  is  greatly  simplified  by  the  Fourier  transform.  Statistically,  the 
large  sample  properties  of  the  Fourier  transform  are  much  simpler  than  those  of  the 
corresponding  time  domain  quantities,  ('omputationally,  fast  Fourier  algorithms 
allow  evaluation  of  interested  parameters  more  rapidly  with  smaller  round-off  error 
than  by  direct  time  domain  evaluation.  Spectral  analysis  has  an  inherent  consist¬ 
ency  and  efficiency  in  its  application  because  the  power  spectrum  and  all  higher- 
order  spectrum  density  functions  use  the  estimates  provided  by  the  direct  discrete 
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Fourier  transform  (DFT)  of  the  raw  time  series.  For  these  reasons,  spectral  analysis 
is  the  time  series  processing  method  implemented  for  feature  development  and 
subsequent  input  to  a  classifier.  The  decision  for  employing  a  feature  extraction 
approach  rather  than  an  optimality  approach  is  discussed  next. 

Consider  the  lime  series  discrimination  problem  where  the  observation  of 
a  discrete  parameter  time  series  x  at  each  of  T  points  in  time  is  given  and  the 
standard  objective  is  to  classify  the  observed  time  series  into  one  of  k  mutually  ex¬ 
clusive  and  exhaustive  classes  with  an  overall  low  misclassification  error  rate.  The 

univariate  sample  time  series  can  be  repre.sented  as  x  =  (jr(0) . x{T -  1))  and  so 

the  classification  problem  concerns  finite  dimensional  random  vectors  where 
standard  multivariate  approaches  are  applied.  However,  Neyman-Pearson  likeli¬ 
hood  or  Bayes  criterion  rules  are  usually  applied  to  classifying  multivariate  vectors 
where  T  is  small,  and  the  learning  population  is  adequate  for  estimating  the  un¬ 
known  parameters.  Generally,  this  is  not  the  case  for  time  series  data.  For  exam¬ 
ple,  the  simulated  time  series  analyzed  in  this  study  have  T  of  approximately  1 200, 
and  the  learning  populations  contain  a  maximum  of  250  time  series.  Furthermore, 
the  actual  time  series  data  analyzed  in  this  study  have  T  of  approximately  2500,  and 
the  learning  populations  contain  a  maximum  of  60  time  series.  Thus,  the  compu¬ 
tations  for  discriminant  function  calculation  and  performance  evaluation  will  in¬ 
volve  inversion  of  large  covariance  matrices  which  arc  also  not  of  full  rank.  Hence, 
the  numerical  difficulties  of  time  domain  calculations  motivated  investigation  of 
other  approaches  for  time  series  discrimination. 

Shumway  (1982)  gives  two  distinct  alternative  approaches  of  spectral  time 
scries  discrimination.  The  first,  or  optimality  approach,  assumes  very  specific 
Gaussian  models  and  solutions  arc  developed  to  satisfy  definitive  minimum  error 
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criteria.  This  approach  generally  requires  prior  knowledge  of  the  time  series  or 
signal  waveforms  so  that  either  linear  or  quadratic  discriminant  functions  can  be 
constructed.  Shumway  discusses  the  use  of  the  frequency  domain  discriminant 
analysis  approach  where  matrix  inverses  can  be  replaced  by  simple  sums  involving 
discrete  Fourier  transforms  (DFT)  and  .spectral  density  functions.  Since  the  DFT 
of  a  weakly  stationary  process  produces  nearly  uncorrelated  random  variables  and 
variances  approximately  equal  to  the  power  spectrum,  estimation  and  hypothesis 
testing  problems  are  formulated  in  terms  of  sample  spectral  densities  with  simple 
approximate  distributions.  Shumway  (1982)  gives  results  which  make  discriminant 
analysis  in  the  frequency  domain  framework  a  very  promising  approach.  But 
Shumway  noted  the  danger  or  limitation  of  the  optimal  approach  (either  time  or 
frequency  domain)  is  the  fact  that  inappropriate  assumptions  of  time  scries  distrib¬ 
ution  structure  can  cause  an  “optimum”  solution  to  be  only  a  rough  approximation 
to  the  actual  problem.  In  fact,  the  time  series  measurements  within  the  exper¬ 
imental  databases  in  this  research  were  found  to  be  highly  non-Gaussian  and  non¬ 
linear  based  on  the  Hinich  bispectrum-ba.sed  .statistical  tests  (Hinich,  1982).  Thus, 
feature  extraction,  the  other  distinct  approach  to  time  scries  discrimination  and 
classification,  was  the  approach  followed  in  this  study.  A  MOS  feature  extraction 
algorithm  developed  to  combine  various  types  of  .spectral  features  of  the  simulated 
and  actual  time  series  is  discussed  in  (Hiapter  4.  Time  series  arc  most  ofien  real¬ 
izations  from  a  stochastic  process  and  mathematical  representations  called 
covariance  functions  u.sed  to  characterize  its  behavior  are  defined  next. 
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2.5  Stochastic  Processes  and  Their  Covariance  Functions 

A  stochastic  process  x  =  {jt(r),  re  T}  is  a  collection  of  random  variables 
that  describes  the  evolution  through  time  of  some  physical  process.  The  index  set 
of  the  process,  T,  is  usually  the  set  of  integers  (discrete)  or  the  set  of  real  numbers 
(continuous).  Consider  here  stochastic  processes  which  are  discrete-time  processes 
so  X  =  {jf(/„)  «  6  N)  is  a  sequence  of  random  variables.  Let  the  means  of  {jr(r,)}  be 
represented  by  The  M'*-order  covariance  of  x  is: 

•  ‘n)  =  E[x{t^  -  /i,)jf(/2  -  t^2)  “  MJ} 

=  £{x(r,)jr(/j) ...  x(/„)  -  - - 

-  H^x{t2)  -  ...  -  ^xit„)  +  I^„} 

R„{t,T)  =  £'{x(f,)af(/,  -l-T)...Jir(r„_,  +  t)}  -  n„E{xit„  _  ^)}  -  - H2E{x{t^)) 

-  -  ^M„_^E[x{t„  +  t)}  -H^E{x{tf  +  t)}  -f-  M1M2  M;, 

=  E(x{ty)...x{t„_^  +  r)}-n^H2.t^n 

=  £{x(r,)  -  +  t)}. 

where  the  marginal  terms  all  vanish  as  it  is  always  assumed  in  this  discussion  that 
random  variable  means  are  zero  and  that  x  has  finite  order  moments.  Clearly,  there 
are  many  possible  orders  (number  of  random  variables  in  the  joint  probability  dis¬ 
tribution)  that  are  used  for  describing  x,  but  concern  is  presently  with  the 
covariance  of  iwo  random  time  variables  or  the  second-order  covariance  function. 
Let  the  means  of  jc(t,)  and  jc(f2)  be  represented  by  /i,  and  respectively.  The 
second-order  covariance  is: 
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l^xxih^h)  =  E{x(h  -  t^\)xih-  ^^2)) 

=  E{xit^)x(t^)  -  ll^xU^)  -  Mi-*-(/2)  +  filial) 
l^xxU,'^)  =  E{x(tj)x{tj  +  r)}  -  n^FAxitj)}  -  n^FAx{t^  +  t)}  +  [19] 

=  +  t)}  - 

=  E{x{tAxU\  +  t)}. 

where  the  second  term  vanishes  as  random  variable  means  are  zero.  The  second- 
order  covariance  is  thus  a  bivariate  expected  value  which  provides  a  summary  of  the 
degree  which  two  random  time  variables  are  associated. 

When  [19]  is  a  function  of  only  the  time  difference  or  lag  parameter,  r, 
with  T  =  fj  —  /|,  so  that 

^xxih'h)  ~  ^xx(^)> 

the  X  process  is  known  as  a  wide-sense  or  weakly  stationary  stochastic  process. 
Requiring  all  marginal  and  joint  density  functions  of  a  random  process  to  be  time 
independent,  or  strictly  stationary,  is  frequently  too  restrictive  an  assumption  in 
practice  as  it  is  hard  to  find  a  strictly  stationary  random  process.  But  there  are 
physical  situations  in  which  the  process  docs  not  change  appreciably  during  the 
time  it  is  being  observed.  Hence,  wide  .sense  or  weak  stationarity  is  adequate  to 
guarantee  that  the  covariance  of  any  pair  of  random  variables  arc  constants  inde¬ 
pendent  of  the  choice  of  time  origin.  In  these  cases,  the  relaxed  wide  sense  sta¬ 
tionary  assumption  leads  to  a  convenient  mathematical  model  to  closely 
approximate  reality.  However,  it  is  the  mathematical  convenience  when  assuming 
weak  stationarity  which  tends  to  prevent  the  proper  investigation  and  applicability 
of  other  forms  of  joint  random  variable  distributions  of  a  particular  stochastic 
process  under  study.  Even  if  nonlinearity  of  a  stochastic  process  is  addressed 
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through  nos  analysis,  these  studies  also  frequently  assume  stationarity  of  the  ran¬ 
dom  process. 

For  stationary  time  series,  there  exists  a  uniquely  defined  decomposition 
into  a  deterministic  and  a  purely  nondetcrministic  component  which  are  mutually 
orthogonal  (Wold,  1954).  This  decomposition  forms  the  basis  of  a  time-domain 
analysis  of  a  given  stochastic  process  generalizing  the  well  known  properties  of 
stationary  processes.  Also,  spectral  analysis,  rather  than  the  time  domain,  provides 
the  powerful  methods  of  harmonic  analysis  (Wiener  and  Masani,  1957).  Harmonic 
analysis  is  possible  for  stationary  processes  because  spectral  representations  in  the 
form  of  Fourier-Stieitjes  integrals  exist  for  the  process  variable  and  the  associated 
covariance  functions: 


on 

=  — on 

where  A.(f)  i''  a  stochastic  process  with  orthogonal  increments.  Cramer  (1960) 
considers  certain  classes  of  nonstationary  proccs.scs  having  similar  spectral  repres¬ 
entations.  He  shows  that  without  requiring  /!.(/)  to  have  orthogonal  increments, 
one  is  led  to  a  class  of  stochastic  processes  called  harmnnizahle  or  cyclostationary 
processes. 

A  process  is  strictly  cyclostationary’  if 


R{x(t,) ...  jt(/J}  =  E{x{t,  +k  n  ...  x(t„  +  Uy, 


for  A:  e  N,  for  all  n,  and  T  denotes  the  period.  When  [19]  is  periodic  in  t  with  period 
7’  for  a  fixed  time  difference  or  lag  parameter,  t,  the  second-order  covariance  is: 


47 


=  £{4/, -)-7)jc(/,  +  r+T)}. 

and  the  x  process  is  known  as  a  weakly  cyclostationary  stochastic  process. 
Cyclostationary  processes  are  processes  whose  joint  distributions  vary  over  time, 
and  are  thus  nonstationary,  but  whose  parameters  vary  according  to  periodic 
functions.  Cyciostationary  processes  are  a  class  of  stochastic  processes  which  ap¬ 
pear  in  the  physical  world  via  a  mechanism  that  provides  some  deterministic  struc¬ 
ture  in  the  observed  time  series.  The.se  processes  are  therefore  appropriate  models 
for  phenomena  involving  cycles  or  when  there  exists  some  underlying  periodicity  to 
the  data-generating  mechanism. 

To  successfully  deal  with  problems  of  statistical  inference  connected  with 
stochastic  processes,  it  is  crucial  to  have  an  appropriate  and  convenient  type  of 
analytical  representation  for  the  particular  class  of  processes  under  study.  This 
analytical  representation  should  express  in  mathematical  form  the  essential  features 
of  the  random  mechanism  assumed  to  generate  the  process.  This  ensures  accurate 
assessments  are  made  on  the  various  statistical  questions  arising  from  process  gen¬ 
eration.  Consequently,  IIO.S  in  addition  to  power  spectrum,  representations  were 
used  and  developed  in  this  research  so  that  the  intennndulaiion  and  nonlinear  effects 
of  random  fault  mechanisms  of  cyciostationary  processes  such  as  rotating  machine 
systems  are  captured  in  the  physical  process  representations.  Background  infor¬ 
mation  on  existing  IIOS  theory  is  given  next. 
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2.6  Higher-Order  Statistical  (HOS)  Theory 

For  single  time  series,  the  idea  of  polyspectra,  or  higher-order  snectra,  was 
originated  by  Blanc- Lapierre  (1953).  Algebraic  and  analytic  detail  was  provided  by 
Leonov  and  Shiryaev  (1959)  and  Shiryaev  (I960),  who  also  considered  the  spectral 
representation  for  a  cumulant,  rather  than  for  a  product  moment.  Shiryaev  attri¬ 
buted  this  idea  to  Kolmogorov.  Brillinger  (1965)  generalizes  the  definitions  of  these 
earlier  papers  by  considering  k-dimensional  time  series.  Brillinger  (1965)  also  de¬ 
veloped  a  theorem  which  explained  the  importance  of  cumulants  rather  than  prod¬ 
uct  moments.  The  actual  term,  polyspcctrum,  is  due  to  Tukey  who  began  the 
development  of  a  calculus  relating  polynomial  operations  to  higher-order  spectra. 

The  power  spectrum  is  a  complex-valued  function  of  frequency  and  is  de¬ 
fined  as  the  Fourier  transform  of  the  second-order  stationary  covariance  function, 
r,(t)  =  EiX(t)X{t  +  r)l 


P 


(/)=f  rj(T)c-''"^ 

—  OO 


dr. 


Now,  a  specific  case  of  polyspectra  is  the  third-order  spectrum,  or  bispcctrum,  a 
complex- valued  function  of  two  frequencies  and  defined  as  the  double  Fourier 
transform  of  the  third-order  stationary  covariance  function, 
r,(r,,  Ti)  =  /-;[A'(/)A'(t  4-  t,)A'(/  4-  tj)]: 


n(A.f2) 


r,(r,,  rj)e 


dr^  r/rj. 
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From  examining  each  spectrum's  Parseval  relations; 


f  [/(/)]  = 


/>(/)#,  and 


(Inf  J-ooJ-oo 

it  is  clear  that  the  power  spectrum  represents  the  contribution  to  the  second  mo¬ 
ment  over  a  particular  range  of  frequency,  and  the  bispectrum  represents  the  con¬ 
tribution  to  the  third  moment  over  of  a  particular  pair  of  frequencies.  Nikias  and 
Raghuveer  (1987)  list  a  wide  range  of  bispcctrum  applications.  Specific  examples 
of  nonlinear  structure  detected  in  a  variety  of  time  scries  using  bispectral  analysis 
include:  nonlinear  interaction  of  ocean  waves  in  shallow  water  (Ilasselman  et  al., 
1963),  analysis  of  acoustic  gear  noise  (Sato  et  al.,  1977),  and  nonlinear  energy 
transfers  in  plasma  (Kim  and  Powers,  1978).  More  sophisticated  statistical  appli¬ 
cations  of  the  bispectrum  are  within  studies  of  nonlinear  spectral  transfer  of  energy 
in  turbulence  (Lii  et  al.,  1976,  Van  Atta,  1979,  and  Helland  et  al.,  1979).  Pro¬ 
ceedings  from  the  1989  Workshop  on  Higher  Order  Spectral  Analysis  (Nikias  and 
Mendel,  1989)  contain  some  recent  developments  of  bispectrum  theories  and  ap¬ 
plications  of  processing  signals  to  extract  information  based  on  cumulants.  The 
latest  developments  of  llOS  theory  and  its  various  applications  arc  in  the  IFItF 
Proceedings  from  the  1991  International  Signal  Processing  Workshop  on  Higher 
Order  Statistics  (Ocorgel,  1991).  In  this  research,  in  addition  to  the  bispcctrum,  the 
second-order  cumulant  spectrum  not  constrained  by  stationarity,  is  investigated  for 
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providing  feature  information  to  a  multivariate  classifier  to  detect  incipient  failures 
of  rotating  machinery. 

2.6. 1  Moments  and  Cumulants 


Following  Rosenblatt  (1983),  consider  the  random  variables  (X, ,  A'*). 
Let  4>Uu  ,  tk)  be  the  joint  characteristic  function  of  the  random  variables 


k  k 

V  =  ( V,,  ...  ,  Vj^).  v^^O,  |v|  =  ^Vy,  v!  =  I^Vy.' 

7=1  ; = 1 

exist  up  to  a  certain  order  |  v  |  <  k,  they  arc  the  coefficients  in  the  Taylor  expansion 
of  4>  about  zero 
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The  joint  cumulants  c.  =  cum  (Xv^, ...  ,  are  the  coefTtcients  in  the  Taylor  ex¬ 
pansion  of  log  4)  about  zero 

log<^(/)=  ^  (ir)*^+o(|r|*).  [22] 

1*1  <  * 

Kendall  and  Stuart  (1958)  has  formulas  relating  cumulants  of  order  k  or 
less  to  the  moments  of  order  k  or  less,  and  Leonov  and  Shiryaev  (1959)  has  for¬ 
mulas  for  the  inverse  relationship.  The  relationship  of  zero  mean  cumulants  to 
moments  up  to  order  six  art,  shown  on  the  next  page.  Rectangular  brackets  are 
used  to  enclose  cumulants,  and  curly  brackets  to  enclose  expectations.  The  curly 
bracket.s  with  subscripted  numbers  are  used  to  replace  the  enclosed  term  with  the 
sum  of  all  distinct  terms  in  a  combinatorial  fashion  (all  permutations  of  the  indices). 
The  subscript  value  denotes  how  many  terms  are  obtained  from  the  index  permu¬ 
tation  operation. 


=  {X,X^} 


ix,x,x,xj  = 


IX,X,X,XM-]  =  -  {{,V,J2,V,,V4}{«}},5  -  {{X,X,X,}{X,X,X,}),, 
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+  {{X^X:,}{X,X^}{X,X,}),, 

These  relationships  show  that  cumulants  are  expectations  with  lower  order 
statistical  dependence  removed.  If  the  cumulant  of  n  random  variables  is  desired, 
the  expectation  of  the  product  of  all  n  random  variables  is  constructed,  and  addi¬ 
tional  terms  are  added  so  the  net  result  will  completely  vanish  if  any  subset  of  the 
variables  is  independent  of  any  other  subset.  For  the  simple  case  of  n=  2, 

IX.X^-]  ^  {X,X2}-{X,}{X^},  [23] 

the  Rl  IS  of  [23]  vanishes  if  X[  and  Xj  are  independent.  For  n 

\ix,x,x,:\  =  [XM) 

-  -  [.V,T2]{.V,}  -  IX, XM)  - 

In  the  zero  mean  case,  the  last  four  terms  of  the  RHS  of  [24]  arc  zero,  and  so  the 
third-order  cumulant  and  third-order  moment  are  the  same.  If  X,  and  X2  and  T,  are 
independent,  the  entire  RHS  of  [24]  is  zero. 

Rosenblatt  (1983)  showed  that  the  existence  of  all  moments  up  to  order  k 
is  equivalent  to  the  existence  of  all  cumulants  up  to  order  k.  Nevertheless,  the 
bispectrum  is  defined  as  the  Fourier  transform  of  the  cumulant  sequence  rather  than 
the  moment  sequence.  Brillingcr  (1965'  gives  three  reasons  for  this  definition. 
First,  cumulants  have  better  independence  properties  than  moments  as  they  arc 
constructed  so  each  order  cumulant  has  the  dependence  on  lower  order  cumulants 
removed.  Second,  for  crgodic  stationary  stochastic  processes,  Fourier  transforms 
of  cumulants  arc  mathematically  better  behaved  than  Fourier  transforms  of  mo¬ 
ments.  The  third  justification  for  the  use  of  cumulants  is  if  the  process  is  stationary 


=  3, 

■  IWW- 
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Gai'ssian,  then  all  of  its  A:'*-order  moments  for  A  S  3  do  not  provide  any  additional 
information  about  the  process.  However,  the  cumulant  function  does  provide  ad¬ 
ditional  information  as  for  A  >  3,  cumulants  are  zero  for  Gaussian  processes. 
Hence,  the  cumulant,  rather  than  the  moment,  is  the  function  needed  to  detect  de¬ 
partures  from  a  Gaus.sian  structure  or  linearity. 

2.6.2  Mathematical  Propertie.s  of  The  Bispectrum 

Mathematical  properties  of  the  bispectrum  are  discussed  in  many  HOS  lit¬ 
erature  references  but  Hinich  and  Patterson  (1989)  emphasize  the  concepts  of  line¬ 
arity,  Gaussianity,  and  stationarity.  Consider  a  time  series,  x(t)  generated  by  the 
linear  model; 


x(r)  =  ^  a(«)f.(/-«)  [25] 

n  oo 

where  {rCOl  is  a  purely  random  series.  The  weighting  function,  or  impulse  response, 
a(n).  is  real  for  physically  realizable  systems,  and  from  causality,  is  zero  for  negative 
n.  If  the  scries  is  Gaussian,  then  the  original  process  {x(0}  is  also  Gaussian, 
and  has  a  zero  bispectrum.  But  if  the  scries  {c(/)}  is  pure  noise  and  non-Gaussian, 
then  {x(/)}  is  non-Gaussian,  anu  has  a  nonzero  bispectrum.  Also,  [25]  can  be 
nonlinear  if  {r(0},  the  input  procc.ss,  and  a(n)  are  dependent  and  {x(/)}  will  be 
nonlinear  even  if  {e(/)}  is  Gaussian,  and  the  bispectrum  will  be  nonzero. 

Let  {x(0}  be  a  itionary  time  scries  with  zero  mean,  and  assume  that  ai! 
expected  values  and  urns  used  exist.  The  power  spectrum  of  [25]  is  the  Fourier 
transform  of  the  autocovariance  function  C„(t)  =  H  [x(/  -t-  «)x(t)]. 
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OO 

S  (/)  =  ^  C^(t)  exp  {  -i2nfn} . 

rt  =  0 

If  S  (/)  is  constant,  then  {x(/))  is  serially  uncorrelated  on  a  white  noise  process.' 
The  bispectrum  of  [25]  is  defined  as  the  two-dimensional  Fourier  transform  of  the 
bicovariance  function  C„,(«,w)  =  F  [  x(/  -I-  «)  x(/  +  m)  x(/)]  which  does  not  depend 
on  t  because  the  process  is  stationary: 


R  (/1./2 )  =  X!  ^  -i2nf^n  -  i2nf2m}. 

m  n 

I'hc  two  frequency  notation  hides  the  three  frequency  interaction  which  is  impor¬ 
tant  in  bispectral  estimation  applications  so  three  frequency  notation 
if,  g-  ~f~  K )  and  the  Cramer  representation  of  [2.5]  was  introduced  by  Brillinger 
and  Rosenblatt  (1967a): 


x(/)  =  exp  [i2nfn]  dAJf)  [26] 

=—00 

where  {  d/i,{f)}  is  a  complex  stochastic  orthogonal  increments  process,  and  the 
integral  defined  in  [26]  is  in  Sticltjcs  scn.se.  Now,  because  [25]  is  real,  «'/!,(  — /)  is 


I  Whiteness  ol  a  series  docs  not  imply  the  scries  is  purely  random.  I'liis  is  important  as 
some  time  scries  techniques  do  stop  fitting  a  model  when  the  residual  errors  appear  to  be 
white  noise.  I  hc  assumption  of  (laussian  residuals  is  made  for  the  sake  of  convenience 
as  zero  correlation  does  imply  independence  in  the  Gaussian  case,  but  if  the  scries  is  non- 
Gaussian.  this  assumption  can  lead  to  wrong  inferences. 


the  complex  conjugate  of  dA.{f).  The  spectral  density  at  f  of  [25]  is 
{f)4f  ~  dA,{f)  dA.{  — /)},  and  the  bispectral  density  for  h  =  — /—  g  is 
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n  if,  g,  h)  dfdg  =  E  {  dA^if)  dAJ,g)  dA,{h)).  [27] 

Because  of  stationarity,  [27]  is  invariant  to  time  translations  so  for  B(f,g.  h)  df  dg 
to  equal  B  (fg)  df  dg  for  all  f  and  g,  the  sum  f+  g  +  h  must  be  zero. 

When  (x(/)}  is  linear,  [27]  is  shown  by  Brillinger  (1975)  to  be 

R  if  g,  h)  dfdg  =  u,A{f)A(g)A{h)  [28] 

where  A  if)  is  the  transfer  function  of  the  impulse  respon.sc  a(t),  =  E  and 

{c(/)}  is  the  innovation  process.  The  spectrum  of  the  linear  process  [25]  is 

Hf)  =  o]A{f)Ai  -f)  [29] 

where  a]  is  the  innovation  process  variance. 

The  right  hand  side  of  [28]  is  invariant  to  permutations  of  the  frequency 
indices/,  g,  and  h  -  —f—g.  Thus,  the  bispectrum's  symmetry  lines  are  as  shown 
in  Figure  2.1  on  page  56.  .Symmetry  means  that  if  values  of  the  bispcctrum  are 
known  at  all  points  in  one  region  about  a  .symmetry  relation,  values  in  the  other 
region  can  be  determined  through  either  a  permutation  and/or  conjugation  opera¬ 
tion. 


56 
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Figure  2.1:  Symmetries  of  Bispectnim 


Because  d/i.(  -/)  =  dA\{f),  li  {  ~f,  -  g,  -  h)  df  dg  =  B’  {/,  g,  h)  df  dg.  This  skew 
symmetry  gives  another  three  symmetry  lines: 

g^  -f,h  =  -/(g  =  0),  and  h-=  -  g  (/=  0). 


Thus,  the  cone, 


C  =  {/,g:  0</.0<g</), 

is  the  principal  domain  region  of  the  continuous-time  bispectrum  in  the  (/.g )  plane. 
Principal  domain  is  the  minimum  region  or  frequency  space  which  estimates  arc 
computed. 

In  physical  reality,  a  continuous-time  process  is  always  sampled  for  some 
finite  period,  so  investigated  processes  are  band  limited  at  some  cutoff  frequency 
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/e.  Contribution  of  frequencies  above  the  cutoff  frequency  to  the  process  variance 
is  therefore  zero,  and  so  the  continuous-time  bispectrum  is  cut  off  at  /  =  +/!, 
g  —  ±  f„  and  f+g=  ±fc-  Thus,  the  continuous-time  set  of  positive  support  for 
absolute  value  of  [28]  is  the  right  isosceles  triangle 

77'=  {f,g  ;  0</</„0<j?  </,/-(- ^  =/,} 

shown  at  Figure  2.2  on  page  58.  But  there  is  also  a  discrete-time  bispectrum  where 
Hinich  (1989)  shows  the  discrete-time  bispectral  density  with  r  as  the  sampling  in¬ 
terval; 


l(  m  n 

(k  +  m  +  tt) 

forf+g  +  hA - ^ -  =  0,  with  signed  integers  k,m,n  restricted  to  keep  the 

indices  in  the  bispectrum's  principal  domain  (Brillinger  and  Rosenblatt,  1967a).  But 
since  there  is  band  limitation  at  /„  the  summation  is  restricted  to  k,m,  and  n  such 
that 


0<f+-^<f,,  0<g  +  ^<f„ 


and 


0<h  +  ^ 


{k  +  m) 


Sampling  actually  causes  an  infinite  number  of  parallel  symmetry  lines  defined  by 
If  g  ~  -Y  and  /+  2g  —  -y.  'T  he  cone  (’  in  Tugurc  2.1  on  page  56  is  cut  by 
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both  of  these  symmetry  relations,  but  for  a  particular  n”,  the  line  f  +  2g  =  —  is 
at  least  to  the  line  2f+g—  —  when  both  lines  are  within  C.  Hence,  the  principal 
domain  of  the  the  discrete-time  bispectrum,  B,  is  the  triangle 

|/,g:  0^/^^,  O^g^f,  If+g  =  "t}' 

which  is  a  proper  subset  of  C.  This  triangle  is  the  union  of  the  sets  IT  and  OT  in 
Figure  2.2.  Statistical  tests  for  Gaussianity  and  linearity  (Hinich,  1982)  and  alias¬ 
ing  (Hinich  and  Wolinsky,  1988)  of  time  series  data  use  estimates  calculated  over 
this  discrete-time  bispectrum  principal  domain  region. 
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2.6.3  Bispectrum-Based  Statistical  Tests 

Ashley  et  al.  (1986)  showed  the  Minich  bispectrum-based  statistical  tests 
(Hinich,  1982)  to  have  substantial  detection  power  for  many  common  forms  of 
nonlinear  serial  dependence  (bilinear,  nonlinear  and  threshold  autoregressive,  non¬ 
linear  moving  average).  Also  demonstrated  was  that  the  bispectral  linearity  test  can 
be  applied  to  raw  source  data  as  well  as  to  the  fitting  errors  of  an  estimated  linear 
model.  Consider  now  the  development  of  these  statistical  tests. 

If  the  mechanism  generating  a  time  series  has  non-zero  terms  in  the  third- 
order  cumulant  function,  then  the  bispectrum  will  be  nonzero  and  vary  with  fre¬ 
quency.  This  fact  is  the  basis  for  the  linearity  and  Gaussianity  statistical  tests 
developed  by  Subha  Rao  and  Gabr  (1980)  and  Hinich  (1982).  Even  though  Rao 
and  Gabr  first  implemented  Brillingcr's  (1965)  method  for  measuring  the  departure 
of  a  process  from  linearity  and  Gau.ssianity  by  using  bispectrum  estimates  of  the 
observed  time  series,  their  tests  do  not  use  the  a.symptotic  properties  of  the 
bispectrum  developed  by  Rosenblatt  and  Van  Ness  (1965),  Shaman  (1965),  and 
Brillinger  and  Rosenblatt  (1967a,b).  There  are  two  approaches  to  smoothing  sam¬ 
ple  bispectra  to  obtain  consistent  and  asymptotic  (ilaussian  estimators  with  known 
sampling  properties  for  large  samples.  Rao  and  Ciabr  (1980)  use  a  lag  window 
kcrnal  to  multiply  the  sample  third-order  covariance,  or  bicovariance,  array  com¬ 
puted  from  a  sample  of  the  time  series;  this  weighted  covariance  is  then  Fourier 
transformed  to  yield  a  bispectral  estimator.  Hinich  (1982)  applies  a  fast  Fourier 
transform  to  the  data  array,  computes  triple  products  of  the  discrete  complex 
Fourier  coefficients,  and  then  uses  a  two  dimensional  smoothing  filter  in  the 
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bifrequcncy  domain  to  obtain  a  bispectral  estimator  with  known  sampling  proper¬ 
ties.  This  allows  for  the  tradeoff  between  variance  and  bias  of  the  estimator.  The 
Hinich  FTT  approach  uses  fewer  arithmetic  steps  than  Gabr  and  Rao's  lagged 
covariance  products  approach.  Another  element  cf  Ilinich's  faster  computational 
approach  is  the  breaking  of  the  data  record  into  intervals  and  then  averaging  the 
sample  bispcctra  for  the  record  blocks.  Additionally,  Rao  and  Gabr  (1980)  did  not 
develop  test  statistics  for  the  significance  of  individual  bispectral  estimates.  On  the 
other  hand,  the  Hinich  statistical  tests  give  chi-squared  statistics  for  testing  the 
significance  of  the  bispectra.  For  these  reasons,  the  Hinich  bispectral-based  statis¬ 
tical  tests  to  detect  departures  from  nonlinearity  and  non-Gaussianity  are  applied 
to  time  series  data  in  this  research. 

With  a  finite  impulse  response  and  two-frequency  index  notation,  [2.5]  is 
fix  ifufi )  =  A  (/,  )  A  (/j )  A*  {f,  +/2 ).  [29] 

oo 

where  /xj  =  f'E*(r),  A(/)  =  Z«(0cxp(  -ilnfn),  and  A'  is  the  complex  conjugate 

fT  -  n 

of  A.  From  [2.5]  and  [29]  a  functional  relationship  called  the  squared-skewness 
function  of  {jr(0}  is  defined  and  is  the  basis  of  the  Hinich  linearity  and  Gaussianity 
tests: 


S(./j)M/2)S(/,  +/2) 


=  — =  A/,../'2)- 


[30] 


Hence,  [30]  is  a  standardized  third-order  cumulant  spectral  function  as  it  is  the 
square  of  the  bispeefrum  normalized  by  the  power  spectrum  product  of  each  cor¬ 
responding  frequency.  The  degree  of  dependence  between  two  frequencies  is 
measured  by  [30],  If  [2.5]  is  linear  then  [.30]  is  constant  over  all  frequency  pairs 
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(/i./j )  in  the  bispectrum  principal  domain.  The  test  for  Gaussianity  of  the  time 
series  far(/)}  involves  testing  that  [30]  is  zero.  Since  =  0  for  the  Gaussian  case, 
a  non-zero  value  of  the  bispcctrum  rejects  Gaussianity.  Brockett  et  al.  (1987)  has 
a  more  complete  discussion  of  these  tests  and  Hinich  (1982)  has  the  precise  formu¬ 
las  and  proofs  concerning  the  test  for  linearity  and  the  derivation  of  an 
asymptotically  normal  test  statistic  based  on  [30]. 

2.6.4  Cyclo.stationary  Processes  and  Higher-Order  .Spectra 

IIOS  research  studies  show  that  nonlinear  phenomena  can  be  studied  by 
computing  higher-order  spectrum  e.stimates.  Nonstationary  phenomena  can  also 
be  studied  by  computing  higher-order  spectrum  estimates  which  are  not  constrained 
by  the  assumptions  of  stationarity.  For  example,  there  may  be  situations  where  it 
is  beneficial  to  compute  estimates  of  the  second-order  cumulant  spectrum  and  the 
third-order  cumulant  spectrum  rather  than  the  power  spectrum  and  the  bispectrum, 
respectively.  I  his  1  IQS  study  conducted  a  time  scries  estimation  approach  which 
computed  linear  (power  spectrum),  nonstationary  (second-order  cumulant  spec¬ 
trum),  and  nonlinear  (bispectrum)  estimates  of  cyclostationary  processes  for  con¬ 
struction  of  feature  information.  Of  intere.st  in  this  research  is  mechanical  vibration 
monitoring  and  diagnosis  for  rotating  machinery.  In  this  situation,  the  periodicity 
arises  from  rotation,  revolution,  or  rcciprocaiior  of  mechanical  structures  such  as 
shafts,  gears,  pistons,  or  propellers.  This  work  presents  evidence  that  frequency 
support  in  the  second-order  cumulant  spectrum  principal  domain  (2-CSPD)  region 
provides  additional  and  significant  feature  information  to  bispectrum  and  power 
spectrum  features  for  wear  signal  characterization  of  rotating  machinery.  Devel- 
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opment  of  the  principal  domain  regions  for  the  second  and  third-order  random 
variable  cases  is  contained  in  the  next  chapter. 

2.7  Summary 

Pertinent  details  of  the  major  methodologies  investigated  to  develop  a  new 
approach  to  the  research  problem  were  given  in  this  chapter.  Reviews  of  existing 
incipient  fault  detection  techniques  show  an  approach  which  employs  HOS  con¬ 
cepts  needs  investigation.  Measuring  difTerenccs  in  multivariate  populations,  and 
particularly  time  series,  from  a  statistical  perspective  was  di.scusscd.  Feature  ex¬ 
traction  is  the  time  series  discrimination  method  employed  as  the  Gaussianity  as¬ 
sumption  for  an  optimality  approach  does  not  apply  to  the  highly  non-Gaussian 
and  nonlinear  time  series  data  analyzed  in  this  research.  Fven  though  different  as¬ 
sumptions  arc  made  about  the  form  of  the  class-conditional  probability  density 
functions  (pdfs)  used  to  characterize  population  differences,  explanations  in  a 
Bayesian  decision  theory  framework  show  that  the  class-conditional  pdf  is  estimated 
in  a  way  so  similar  values  result  when  the  function  is  evaluated  for  features  from 
the  same  class,  and  widely  differing  values  result  when  evaluated  from  different 
classes.  Statistical  tests  of  the  null  hypothesis  that  the  centroids  of  different  classes 
are  equal  are  used.  These  tests  are  based  on  the  partitioning  of  the  matrix  of 
squared  deviations  of  observational  feature  sets  from  the  sample  centroid  into  ma¬ 
trices  representing  within  and  among  class  components,  riosely  related  to  dis¬ 
crimination  is  classification  which  applies  the  decision  rule  to  assign  a  multivariate 
feature  set  with  unknown  class  membership  to  its  proper  class.  This  research  ap¬ 
plied  linear,  quadratic,  and  4-ncaresl-ncighhor  classifiers.  An  unknown  time  series 
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observation  generated  by  a  cyclostationary  process,  defined  by  “optimum” 
discriminant  feature  sets,  will  be  assigned  to  the  class  which  has  the  highest  classi¬ 
fication  function,  or  posterior  probability  value.  Applying  classification  rules  to 
simulated  and  actual  experimental  time  series  data  of  known  categories  will  result 
in  measures  of  the  classification  power  of  the  rules  and  their  respective  extracted 
feature  sets.  Major  concepts  of  exi.sting  HOS  theory  emphasize  the  importance  of 
justifying  the  u.se  of  certain  limiting  as.sumptions  such  as  linearity,  Gaussianity,  and 
stationarity  of  the  stochastic  process  under  .study.  Use  of  proper  spectral  esti¬ 
mation  procedures  which  include  cumulant  spectrum  estimation  are  investigated  to 
provide  improvement  to  extracted  feature  information  for  cyclostationary  time  se¬ 
ries  di.scrimination  and  classification.  The  development  of  this  new  analytical  ap¬ 
proach  is  contained  in  the  next  two  chapters. 


Chapter  3 

Cumulant  Spectrum  Estimation 


3.1  Introduction 

This  chapter  discusses  the  approach  to  cumulant  spectrum  estimation  of 
periodic  time  series  data  generated  by  physical  systems  such  as  rotating  machinery. 
Cyclostationary  models  are  used  to  represent  these  physical  systems  as  they  contain 
both  deterministic  and  random  components.  The  deterministic  component  is  due 
primarily  to  the  constant  periodic  force  of  the  machine.  The  random  component 
is  due  to  various  sources  such  as  the  randomness  of  the  process  under  study  (ie.  the 
process  of  wear),  different  operating  and  maintenance  conditions  (process  environ¬ 
ment),  and  randomness  of  the  machine  manufacturing  process  (no  two  machines 
are  exactly  the  same). 

In  the  time  domain,  various  orders  of  covariance  functions  can  be  used  to 
describe  random  processes.  Alternatively,  random  processes  can  be  characterized 
by  the  Fourier  transforms  of  these  various  covariance  functions.  This  chapter  pre¬ 
sents  the  key  ideas  underlying  second-order  cumulant  spectrum  c.uimation  for  a 
single  time  series.  .Second-order  cumulant  .spectrum  estimation  is  a  new  procedure 
that  provides  information  beyond  the  bispectrum  and  power  spectrum  estimation 
of  the  experimental  time  .series  data  dc.scribed  in  Chapter  .5.  The  spectral  estimation 
approach  describes  the  estimation  of  stationary,  cyclostationary,  and  nonstationary 
processes  in  an  integrated  manner.  It  has  the  potential  with  further  development 
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for  use  as  tests  for  stationarity  and  periodicity  of  the  observed  time  series  but  the 
primary  interest  in  this  research  is  the  use  of  second-order  cumulant  spectrum  esti¬ 
mates  for  feature  extraction  and  incipient  fault  characterization  of  cyclostationary 
processes. 

This  work  is  different  from  the  previous  HOS  monitoring  study  (Sato, 
1977)  and  other  treatments  of  cyclostationary  processes  (Gladyshev,  1961  and 
Ogura,  1971  and  Hurd,  1969,  1989a,  1989b,  and  Gardner,  1989)  in  that  it  incorpo¬ 
rates  estimation  concepts  of  several  classes  of  stochastic  processes  (nonstationary, 
stationary,  and  cyclostationary)  in  terms  of  spectral  correlation  functions  calculated 
over  a  second-order  cumulant  spectrum  principal  domain  (2-CSPD)  region.  The 
2-CSPD  is  derived  from  symmetry  properties  of  the  Fourier  transform.  There  are 
several  important  properties  of  the  2-CSPD  region.  First,  the  support  of  the 
2-dimensional  cumulant  spectral  measure  for  purely  stationary  processes  is  con¬ 
strained  to  a  diagonal  line  defined  by  yj  =  -/j  in  the  fourth  quadrant  of  the 
2-CSPD  space.  Cramer  (I960)  and  Brillinger  (1965)  state  this  property  so  this  idea 
is  not  new.  Second,  the  support  for  strongly  harmonizablc  periodically  correlated 
processes  (Gardner  and  Franks,  1975  and  Hurd,  1989a,  1989b)  or  purely 
cyclostationary  processes  (PCS)  is  constrained  to  a  set  of  equally  spaced  parallel 
lines  to  this  stationary  support  set  where  the  Euclidean  line  separation  distance 
between  correlated  spectral  components  indicates  the  period  or  cycle  exhibited  in 
the  data  set.  However,  the  new  result  demonstrated  in  this  research  is  that  calcu¬ 
lated  second-order  cumulant  spectrum  estimates  not  on  a  purely  periodic  support 
grid  arc  those  of  concern  and  increased  interest  when  studying  incipient  wear  of 
rotating  machinery.  The  estimated  frequency  support  not  restricted  to  the  funda¬ 
mental  periodic  "omponents  and  their  harmonics  for  both  stationary  and 
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cyclostationary  support  sets  in  the  2-CSPD  represent  intermodulations  of  the 
stochastic  process.  These  modulated  frequency  components  are  proposed  to  pro¬ 
vide  additional  and  better  feature  information  for  inci^j.cnt  wear  signal  character¬ 
ization  than  periodic  spectrum  components. 

3.2  Cumulant  Spectra  and  Their  Estimation 

Let  ,jc(tf,)  be  N  observations  from  a  random  time  series  sampled 

on  M  =  {«  :  n  =  0,1,  —  1,2,  —  2, ... }.  The  {jr  (r„), «  e  M}  are  random  variables  and 
this  notation  is  equivalent  to  T(J), ...  ,X(bf)  with  N  jointly  distributed  random  var¬ 
iables  with  a  common  marginal  distribution  and  zero  mean,  but  lower  case  x  is  used 
in  time  series  literature.  Consider  the  vector  of  time  points 

7  = 

where  M''  =  MxMxMx-M  .  Let  cum(  t/N)  denote  the  Nth-order  cumulant 
of  the  time  series  .sample  of  any  N  dimensional  subset  {x(ti),  jrlh),  ...  ,A:(r„)}  of  the 
jointly  distributed  random  variables.  In  this  notation, 

cum{tl\)  =  /t[A'(/)]  =  /s[j«:(/)]  =  0  for  each  t  because  common  marginal  distrib¬ 
utions  have  zerc  mean.  Strictly  stationary  processes  have  c.um{llN)  depending 
only  on  the  N-1  time  points  denoted  by  the  vector  d  =  (h  -  h,  ...  ,  /«—  h).  Strictly 
periodically  correlated  or  cyclostationary  processes  have  c.um  ( tfN)  depending  only 
on  the  N/T  time  points  where  T  denotes  the  period  or  cycle  and  denoted  by  the 
vector  p  =  (/,, ...  ,  N/r). 
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Consider  the  general  nonstationary  case  and  transform  a  vector  of  time 
points,  t  to  its  spectral  representation,  /  =  (/,  -•■  ,/w).  The  Nth-order  cumulant 
spectrum  is  now  defined: 


CumSJflN}=  ^ct/m(7/yV)exp[-y27r(7'7)].  [31] 

A  strictly  stationary  process  has  the  Nth-order  cumulant  spectrum  replaced  by  the 
N-lth  order  polyspextrum  definition: 

5,(7//V-1)=  ^  cum(;//iV)exp[-/2/r(7'7,-,)]-  [32] 

7eM^‘' 

A  Strictly  periodically  correlated  process  has  the  the  Nth-order  cumulant  spectrum 
replaced  by  the  Nth-order  periodically  correlated  spectrum: 


PCS,{flN)=  YjCum{plN)^xp[-i2ir(p'f)l  [33] 

PeM 


Applying  [32]  to  [31],  the  general  relationship  between  cumulant  spectra 
and  polyspcctra  can  be  expressed  as: 

Cum  S,  if  IN  )  =  -^  (/,  +  -  +f„)S,  [fjN  -  \  ).  [34] 

where  (^(f)  is  the  Dirac  delta  function.  The  RMS  of  [34]  is  zero  for 
,A  b  /  -I-  ...  due  to  the  sifting  property  of  delta  functions.  Therefore. 
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[34]  states  that  correlated  polyspectral  components  that  do  not  sum  to  zero  are 
represented  only  in  the  cumulant  spectrum  principal  domain. 

Similarly,  applying  [33]  to  [31],  a  general  relationship  between  cumulant 
spectra  and  periodic  correlated  spectra  can  be  expressed  as; 

Cum  S,  CfIN)  =  Hoif,  +  -  +  oLf^)  PC  S,  CfjN ),  [35] 

where  (5(a/)  is  the  Dirac  delta  function  and  a  =  -|r  denotes  the  period.  Again, 
since  the  RIIS  of  [351  is  zero  for  a/I  -f-  ayi  +  ...  +  ayV  0,  +  I ,  +  2, ...  due  to 
the  sifting  property  of  delta  functions,  [35]  implies  that  correlated  periodic  spectral 
components  whose  values  do  not  sum  to  zero  or  any  integral  multiple  of  the 
periodicity  are  rcp'^csentcd  only  in  the  cumulant  spectrum  principal  domain. 

Hence,  periodically  correlated  spectra  and  polyspcctra  arc  both  subset.’;  of 
cumulant  spectra.  Stated  differently,  purely  periodic  correlated  spectra  are  con¬ 
strained  to  a  spectral  support  set  that  is  constrained  to  equally  spaced  manifolds 
defined  by  cycle  frequencies  that  sum  to  zero  or  integral  multiples  of  the  period. 
Polyspectra  arc  constrained  to  a  spectral  support  set  where  the  individual  frequency 
variates  sum  to  zero.  In  a  practical  approach  to  demonstrate  these  relationships 
dilTercnt  orders  of  stationarity,  cyclostationarity,  and  nonstationarity  are  investi¬ 
gated.  (Consider  first  the  covariance  structure  for  the  two  random  time  variable  case 
and  its  corresponding  second-order  cumulant  spectrum  domain  representation. 

3.2. 1  Second-Order  Cumulant  Spectrum 

Consider  a  zero-mean  second-order  continuous-time  stochastic  process 


,x  =  ix(l,).  i  e  N)  with 
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E  {jr(/,)jr(f2)}  =  E  {jr  ('i  +  t)}  =  =  RxxO.  t). 

Assume  the  process  is  such  that  the  expected  value  exists  for  all  t  and  t  with 
T  =  6  —  6,  is  not  identically  zero,  and  is  continuous  to  avoid  anomalous  behavior. 
If  the  second-order  cumulant  is  a  function  of  the  time  difference,  t: 


=  Rxx('2-^i)  =  RxxW 


then  the  random  process  x  is  weakly  stationary.  Now,  if  R„(/,  t)  is  periodic  in  t 
with  period  T  for  a  fixed  t,  then  x  is  wide  sense  cyclos'ationary  or  periodically 
correlated,  and  the  second-order  cumulant  function  is: 


“  ^x.ic  (fi  +  7',  /,  -f  r  -I-  t) 
=  Rjj.^  (a/,  at  -f  t) 


with  at  —  t  T  denoting  the  period. 

This  expectation  or  second-order  cumulant  time  function  has  the  following 
second-order  cumulant  spectrum  representation: 


N-  \  V-  1 


k  =  n  I  =  0 


Esing  the  properties  of  exponentiation  and  summing  a  over  all  possible  integral 
multiples  of  the  fundamental  period: 
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1  N-\ 


fc  =  0  /  =  0 


Now,  break  the  triple  sum  into  two  parts,  ot/l  =  —  a/  and  aft  ^  —  of,  to  obtain: 


=  X  E  + 

I  it  =  0 
a  ^ 

lim  /tf-i/v-i 

T^ao  Yj  Y 

I  k  =  0  /  =  0 

1  r 

Close  inspection  of  the  spectral  representation  of  [36]  reveals  some  important  im¬ 
plications: 

1.  If  X  is  a  wide- sense  stationary  random  process,  the  spectral  correlation  is  only 

a  function  of  the  time  shift  parameter,  r.  It  is  independent  of  the  time  param¬ 
eter,  t,  and  also  the  period  parameter,  a,  so  the  second  term  of  [36]  vanishes 
or  is  zero.  Furthermore,  the  period  parameter  does  not  exist  in  the  spectral 
correlations.  This  is  possible  only  if  the  random  complex  amplitudes,  Xi  and 
X,  are  uncorrelated  or  /:[X,  X]  =  0  for  all  /*  Thus,  stationary 

processes  will  have  spectral  correlation  support  in  the  second-order  cumulant 
spectrum  principal  domain  (2-CSPD)  region  constrained  to  /*  =  —  f,. 

2.  If  X  is  cyclostationary,  spectral  correlations  are  non-zero  at  integral  multiples 
of  a.  This  first  implies  that  difTerent  random  complex  amplitudes  are  con¬ 
strained  to  specific  portions  of  the  diagonal  stationary  support  line  in  the 
2-('SPD  defined  by  ct/i  ^  -  a/.  This  correlation  is  defined  as  the  first  term  of 
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[36]  and  has  stationary  characteristics.  Periodic  correlation  components  are 
also  created  from  the  second  term.  The.se  are  a  function  of  at  and  are  o/f  the 
support  set  defined  by  a/*  =  —  afi,  but  rather  are  constrained  to  a  support  grid 
in  the  2-CSPD  defined  by;  afk  ~  a/,.  So,  cyclostationary  processes  also 
have  nonstationary  characteristics  and  provide  a  “bridge”  between  stationary 
and  nonstationary  processes.  In  fact,  cyclostationary  processes  are  tractable 
with  generalizations  of  the  tools  used  to  study  stationary  processes  (Cramer, 
1960,  Gardner  and  Franks,  1975,  and  Hurd,  1989a  and  1989b). 

3.  If  X  is  nonstationary,  non-zero  spectral  correlation  may  occur  not  only  at  inte¬ 
gral  multiples  of  at  but  rather  for  any  t.  Note  that  this  general  class  includes 
cyclostationary  and  stationary  processes  as  a  subset.  Also,  only  the  general 
second-order  cumulant  spectrum  representation  captures  spectral  correlations 
beyond  the  various  support  sets  defined  by  /*  =  -  fi,  a/,  =  -  af,,  and 
afi,  -  af,.  Correlated  frequency  components  that  are  a  function  of  any  t 
represent  modulations  of  the  purely  periodic  interacting  components.  Also,  the 
second  term  of  [36]  which  has  the  periodic  frequency  components  multiplied 
by  the  sinusoid  also  represent  modulations  and  has  support  off  the  purely 
cyclostationary  grid  (sec  Figure  3.1  on  page  72).  These  modulated  dependent 
frequencies  are  proposed  as  more  u.seful  than  single  or  coupled  harmonic  tones 
in  characterizing  random  incipient  wear  processes  of  rotating  machinery. 

It  is  shown  how  cyclostationary  processes  have  both  stationary  and  nonstationary 
characteristics.  More  importantly,  the  investigation  of  a  more  powerful  feature 
characterization  of  periodic  signal  data  for  wear  discrimination  and  classification 
requires  cumulant  spectrum  estimation.  The  2-CSPD  region  on  which  estimates  are 
computed  for  the  joint  random  complex  amplitude  distribution  is  now  developed. 
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Figure  3.1:  Spectrum  Broadening  Due  to  Modulation  --  “Sidebands” 
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3.2. 1.1  Principal  Domain  Development 

Without  assuming  weak  stationarity  the  continuous,  second-order  cumulant 
spectral  representation  of  a  stochastic  process  is; 


CumS(f,J^) 


rr 


[37] 


This  cumulant  spectrum  equation  defines  each  estimated  quantity  over  the  entire 
(f\,fi)  planar  region.  However,  it  is  not  necessary  to  compute  the  second-order 
cumulant  spectrum  over  this  entire  frequency  plane  as  the  Fourier  transform  pos¬ 
sesses  two  important  symmetry  properties:  permutation  and  conjugation.  The 
2-CSPD  is  defined  as  the  minimum  space  on  which  second-order  cumulant  spec¬ 
trum  estimates  arc  computed.  The  following  four  step  process  can  describe  the  PD 
development  for  any  order: 

1.  Apply  permutation  symmetry  of  Fourier  transform  (complex  variates). 

2.  Apply  conjugation  symmetry  of  Fourier  transform  (real  variates). 

3.  Combine  permutation  and  conjugation  symmetry  operations. 

4.  Bandlimit  and  properly  sample  the  process. 

fhe  2-('SPD  is  now  developed  with  a  corresponding  graphical  depiction. 

Consider  CumS„  (  )  as  an  estimate  of  the  true,  second-order  cumulant 

spectrum  which  is  based  on  the  continuous  Fourier  transform  of  a  large  but  finite 
record  length,  T,  of  the  process  x: 


CumS,,  (/,  ,/2 )  =.  -L  /,■  ( ..V(/, )  X(f, )} . 


[38] 


Now  consider  CumS,,  ): 


CumS^if,,/,  )  =  ^  E  {X{f,)  X(f,  )} 

=  yE{X(/,)^(/2)} 
CumS^{f^J^)  =  CumSj„(/,,/2). 


Because  of  this  permutation  symmetry,  a  45”  line  divides  the  entire  plane  into 

two  equivalent  half-planes.  See  Figure  3.2. 


('onsidcr  only  the  right  half-plane.  The  complex  2-CSPD  is: 

{/i-./i  •  -oo  s/i  ^  00./2  ^/il- 


Now  consider  CumS..  (  —ft,  —  f  ): 
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Cums„i  -yi,-/a)  =  -A)X{  -A)) 

=  £{r(/,)r(/j)} 

CumS^  (  -/„  -/2  )  =  rumS'M  Ji  )• 

where  •  denotes  the  conjugate  operation.  This  is  the  conjugation  symmetry  prop¬ 
erly.  It  exists  because  for  x  a  real-valued  stochastic  process,  X{—f)  =  X'{f).  So, 
cumulant  spectrum  values  in  quadrants  I  (11)  are  equivalent  to  those  estimates 
computed  in  quadrant  111  (IV).  Consider  only  quadrants  I  and  IV.  See 
Figure  3.3. 
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Figure  3.3:  Real  Second-Order  Cumulant  Spectrum  Principal  Domain  (I) 

Thus,  the  2-CSPD  for  a  real-valued  process  is: 

{/l-/2  :  0  r^/,  oo, -CO  </2  ^  oo}. 


Now  consider  CurnS,,  (  —  —  /i ): 
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CumS^  (  -A.  -f\  ) 


CumS^i  -A, -A) 


-A)n  -/,)} 

-^E{X{A)X{f,)) 

Cumtj,AJi)- 


This  property  is  due  to  the  combination  of  the  permutation  and  conjugation  prop¬ 
erties  of  the  Fourier  transform.  It  states  that  the  45“  line  in  quadrant  I  and  the 
-  45°  line  in  quadrant  IV  arc  both  lines  of  symmetry.  Thus  the  original  two  fre¬ 
quency  plane  space  (A, A)  has  been  halved  by  the  permutation  property  and  then 
this  half-space  is  split  into  halves  again  (see  Figure  3.4). 


i 

i  f  2  f  1  =  f  2 

* 

* 

* 

Figure  3.4:  Real  Second-Order  C 

f  1  =  -  f2 

umulant  Spectrum  Principal  Domain  (11) 

llencc,  both  symmetry  properties  result  in  slicing  the  original  planar  region  down 
into  a  quarter  of  its  size  and  the  reduced  2-CSPD  for  a  real-valued  x  is  now: 


{AJi  ■■  o<A<o^,  I/2I  s/,}. 
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In  actual  implementation,  the  second-order  cumulant  spectrum  is  evaluated 
numerically  through  a  digital  signal  processing  scheme.  Hence,  to  satisfy  the  well- 
known  Nyquist  sampling  theorem,  the  stochastic  process  is  bandlimited  and  sam¬ 
pled  at  of  least  twice  the  highest  frequency  component,  f„  to  prevent  aliasing. 
Consequently,  when  the  auto  second-order  cumulant  spectrum  is  computed,  it  is 
only  necessary  to  compute  estimates  which  reside  in  the  triangular  discrete  2-CSPD 
region  (see  Figure  3.5)  defined  by; 

{/1./2  :  0<  I/, I 


Figure  3.5:  Di.screte  Second-Order  Cumulant  Spectrum  Principal  Domain 


The  same  four  step  process  can  be  applied  to  each  succeedingly  higher  di¬ 
mension.  For  the  third-order  cumulant  spectrum: 


000 

^  ^  dt^dt,,  [39] 


—no  ''  — <x> 


each  estimated  quantity  is  defined  over  the  entire  )  cubic  frequency  space. 

However,  estimates  need  only  be  computed  in  the  discrete  3-CSPD  region  defined 
by: 

{/,./2./3  :  0<  I/3 1  </2  </,</,). 

The  graphical  depiction  of  principal  domain  regions  in  higher  dimensions  is  more 
difficult.  Fortunately,  a  picture  is  not  required  by  the  computer  to  calculate  the 
estimates. 

3.2. 1.2  Estimation  Procedure 

This  section  describes  the  computational  procedure  to  compute  estimates 
for  the  second-order  cumulant  spectrum  (2-CS)  from  the  direct  discrete  Fourier 
transforms  (DFTs)  of  finite  record  lengths.  The  procedure  is  basically  an  extension 
of  procedures  for  computation  of  traditional  power  spectra  and  it  will  be  shown 
that  2-CS  estimates  are  computed  from  multiplying  two  di.scrcte  complex  amplitude 
spectral  density  functions.  The  same  procedure  can  be  followed  for  computing  es¬ 
timates  for  the  third-order  cumulant  spectrum  (3-CS)  except  that  three  complex 
amplitude  spectral  density  functions  arc  multiplied  and  are  computed  over  a  differ¬ 
ent  principal  domain  region. 

Begin  with  [38]  which  expre.sses  CumS„  (./I,/: )  in  terms  of  expectations  of 
two  random  complex  amplitude  functions:  X{f,)  and  X{fj).  Performing  the  ex¬ 
pectation  operation  of  random  variables  requires  knowledge  of  an  appropriate 
probability  density  function  which  in  most  experimental  studies  is  unknown.  Thus, 
ensemble  averaging  over  a  sequence  of  sample  2-(JS  is  the  approach  for  estimating 
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the  expectation  of  the  two  complex  random  variables.  Consider  N  identical  inde¬ 
pendent  trials  or  runs  of  an  experiment.  Each  run  yields  an  outcome  denoted  by 
where  k  =  1,2, ,  N.  The  collection  of  individual  realizations  defines  the 
ensemble  of  a  particular  stochastic  process.  If  it  is  impractical  or  too  costly  to 
gather  a  large  number  of  realizations  of  a  particular  stochastic  process  by  repeating 
the  experiment  N  times,  there  is  another  way  to  obtain  more  samples.  If  the  time 
duration  of  the  original  data  record  is  long  enough,  it  may  be  subdivided  into  indi¬ 
vidual  frames  of  sufficient  length  to  maintain  independence  and  subsequently  im¬ 
prove  the  quality  of  the  estimates.  Of  extreme  importance  with  processing  time 
series  data  from  periodic  phenomena  is  definition  of  the  frame  length  as  an  integral 
multiple  of  the  fundamental  rotational  frequency  of  the  process  being  studied. 
Also,  another  constraint  when  characterizing  incipient  faults  is  to  capture  at  least 
two  fundamental  periods.  This  is  the  lower  bound  as  frequency  resolution  of  sam¬ 
pled  estimates  is  given  by  V/  =  l/F  where  T  is  the  processed  record  length.  Frame 
lengths  defined  in  this  manner  will  capture  modulations  of  the  specified  periodic 
frequency  which  have  been  found  to  help  characterize  incipient  wear  states.  No 
data  windowing  techniques  arc  needed  to  reduce  the  effects  of  leakage  if  an  integral 
number  of  periods  or  cycles  arc  captured  with  the  defined  frame  lengths. 

So  consider  the  set  of  N  realizations  or  records  for  x,  each  record  being  T 
seconds  long.  Each  realization  sampled  with  sampling  interval  /„  and  consists  of 
N  -  I'lt,  samples.  T  and  N  arc  determined  from  considerations  previously  dis¬ 
cussed.  Let  represent  the  k"'  sampled  realization  and  yV'*'[/]  the  corre¬ 

sponding  DFT  of  x'*’[«].  On  the  basis  of  [38],  the  appropriate  estimate  of  the 
sample  2-OS  is: 
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CumS,,{f,J^)  =  4r^r’(/.)4‘^(/2)-  [40] 


('onsider  the  values  of  the  CFTs  at^J  =  /V/  where  V/  =  1/7.  Then  [38]  becomes: 


Now,  the  sample  values  of  the  CFTs  in  terms  of  their  respective  DFTs  are: 


V/ 


[41] 


so  expression  of  the  continuous  auto  2-CS  in  terms  of  its  discrete  auto  2-CS  is: 


CumS^ll^,  Ij] 
V/2 


[42] 


Substituting  [41]  and  [42]  into  [40]  to  obtain  the  sample  discrete  auto  2-CS: 


Cu^5W[/„4] 

S/f 


_  J_  v<*)  y<*)  11. 

T  '^r  V/- 


The  final  estimate  of  the  discrete  auto  2-CS  is  found  by  ensemble  averaging  over 
all  N  realizations  of  the  stochastic  proccs*:  studied: 


N 

CMm5„[/|,/j]  =  -L  ^  CMm.si*'[/|,/2]. 

k  =  I 


[43] 
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Note  that  if  x  =  t  e  N}  is  in  volts,  then  the  2-C5  estimate  has  the  dimensions 
Cl  volts^  and  represents  spectral  components  of  two  dimensional  bandwidth  cen- 
te  ed  at  /  aiid  /  contributing  to  the  mean  square  value  of  the  stochastic  process. 

It  is  important  ta  mention  that  estimates  of  succeedingly  higher  orders  of 
cum  -lant  spectra  are  found  by  a  procedure  similar  to  the  one  described  for  the 
secor.  1-  rd  r  case.  The  only  difference  is  the  number  of  DFTs  being  multiplied  over 
a  corrcspon  n»ly  dimensioned  PD  region.  Pseado-code  for  second-order  cumulant 
estimation  of  cyclostationary  time  series  is  given  below  and  the  actual  FORTRAN 
program  is  at  Appendix  A. 

Procedure  Second-Order  Cumulant  Cstimatiuii 
Receive  valid  ti.me  senes  input  parameters  (#  .samples,  sample  rate,  block  length) 
Load  time  series  data  into  FFT  work  arrav 

Calculate  statistics  (moments  and  cumulants)  and  subtract  mean  from  data 
While  data  blocks  exist  do 

a)  Subtract  block  nican  from  data 

b)  Perform  DFT 

c)  Calculate  Second-Order  Cumulant  Spectrum  (double  complex  product) 

d)  Accumulate  Chi-Squared  Statistics  over  2-CSPD 

End  Do 

Output  block  summar}'  statistics  and  correlations  and  (ilobal  Statistics 
END 

Valid  parameters  imply  that  sufTicicnt  working  storage  space  is  defined  to 
handle  the  amount  of  samples  in  the  time  scries  and  also  the  chosen  block  length 
is  at  least  twice  the  fundamental  frequency  of  the  system,  fwo  or  more  funda¬ 
mental  periods  in  a  processed  data  block  will  calculate  second-order  cumulant  es- 
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limates  which  correspond  to  intermodulation  efTects  or  nonstationary 
characteristics  of  the  random  fault  mechanism.  Breaking  the  entire  time  series 
record  into  proper  block  lengths  is  a  method  to  increase  estimation  reliability  by 
decreasing  estimate  variability. 

3.2.  t. 3  Reliability  of  the  Estimates 

Because  of  inter-machine  variability,  time  series  estimates  of  the 
stochastic  process  under  study  are  not  perfect.  Estimator  quality  is  usually  de¬ 
fined  by  its  bias  and  variance.  Bias,  b,  is  the  difference  in  expected  value  of  the 
estimator  and  the  “true"  value: 


h  =  £1^]  -  4), 

and  so  an  estimator  is  unbiased  if  =  4>-  Variance  is  the  spread  in  value 
about  the  expected  value: 

V14>'\  =.  o'  =  £C(^  -  £[«)'] 

=  -  2^E[,h  +  C'W]] 

=  -  2£I^]£[0]  +  E\h 

=  £10']  -  £'[0]. 

So,  estimator  variance  is  equal  to  mean  .square  value  of  the  estimator  minus  the 
square  of  the  mean  of  the  estimate.  Now,  consider  mean  square  error  which  is 
defined  as  the  following  and  can  be  derived  through  expansion  of  the  expectation 
operation: 
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=  EC(<i>  —  =  <7^  4-  b^. 

Assume  that  the  realizations  are  independent  and  that  the  process  is  ergodic. 
Then  with  bar  notation  meaning  expectation; 


CumS^[/,.4]  = 


N 


k  =  1 
N 


k  =  I 


=  A^C«m5„C/„/2] 


and  hence  the  estimator  in  [42]  is  unbiased.  Also,  [42]  has  a  variance  given  by: 
Kar{CMm.i,[/,./2]}  =  C»mS^[ /„ /^  ], 


so  that  the  variance  decreases  with  the  number  of  realizations.  This  agrees  with 
one's  intuition  that  averaging  of  more  realizations  of  random  processes  creates 
better  estimators. 


Chapter  4 

HOS  Feature  Extraction 


4.1  Introduction 

A  feature  extraction  algorithm  is  developed  to  exploit  the  additional  in¬ 
formation  provided  by  the  power  spectrum  and  the  HOS  transformations  of  raw 
time  series  data.  Several  thousands  of  spectral  estimates  are  usually  generated  for 
a  particular  time  series  analysis  application  so  a  finite  subset  of  the  spectral  esti¬ 
mates,  or  features,  are  chosen  from  the  entire  collection  of  spectral  measurements 
to  provide  “optimum”  classification  results.  There  are  three  reasons  for  investi¬ 
gating  alternative  estimation  and  feature  extraction  methods  and  evaluating  their 
classification  results  rather  than  using  only  one  type  of  spectral  analysis  approach. 
First,  there  is  a  cost  for  performing  many  different  types  of  spectral  estimation 
procedures  and  their  subsequent  feature  extraction.  Actual  computing  time  to 
perform  HOS  estimation  is  not  the  most  prohibitive  factor;  it  is  the  feature  ex¬ 
traction  process  and  subsequent  clas.sification  that  takes  time  and  additional 
computing.  It  is  possible  that  power  spectrum  c.stimation  and  feature  extraction 
may  provide  sufficient  classification  performance  for  a  particular  application. 
However,  performing  HOS  estimation  and  feature  extraction  can  be  worthwhile 
if  it  improves  the  overall  classification  performance  of  the  power  spectrum-ba.sed 
approach.  Proper  HOS  transformations  of  the  raw  time  series  can  lead  to  more 
effective  decision  surfaces  because  of  the  more  accurate  representation  of  the 
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stochastic  process  structure  being  studied.  Second,  reducing  the  dimensionality 
of  the  feature  space  eliminates  redundancy  as  variables  which  do  not  add  to  the 
classification  accuracy  are  not  included  in  the  final  decision  rules.  Third,  a  lower 
misclassification  rate  is  sometimes  achieved  by  using  fewer  feature  variables. 
Further  discussion  of  this  topic  is  in  the  next  section.  The  significant  outcome  of 
the  feature  extraction  process  is  the  exposure  of  the  individual  spectral  feature 
variables  and  their  combinations  in  mea.suring  differences  of  multivariate  popu¬ 
lations.  Thus,  the  HOS  feature  extraction  approach  is  not  only  multivariate,  but 
also  multispectral. 


4.2  Features  and  Their  Relationship  to  Misclassification  Rate 


Previous  pattern  recognition  research  observed  for  a  given  design  set,  in¬ 
creasing  the  number  of  feature  variables,  d,  causes  classification  performance  to 
initially  improve,  but  then  to  deteriorate  (Hand,  1981).  This  occurs  because  the 
decision  surface  better  fits  the  design  set  with  increasing  d  but  generalizes  less  well 
to  new  samples  since  the  design  set  became  more  sparsely  distributed  and  less 
rcpre.sentative  of  the  class-conditional  pdfs.  Hand  (1981)  explains  this  phenome¬ 
non  using  Hotelling's  7'^  statistic. 

Hotelling's  7'^  statistic,  the  distance  between  two  sample  means  relative 
to  the  dispersion  within  the  samples  is: 


-r-2 

T  = 


Yl\TX’\  5 

■*■2)'  (-^1  - 
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where  is  the  mean  for  class  cj,,  V  is  the  assumed  common  variance-covariance 
matrix,  and  O’  is  the  squared  Mahalanobis  distance  measure  defined  in  the  back¬ 
ground  chapter.  To  investigate  multivariate  population  difTerences,  the  question 
asked  is  how  often  is  T’  observed  as  large  or  larger  than  the  7’’  estimated  from  the 
samples  if  the  two  populations  are  identical?  The  statistical  criterion  value  de¬ 
fined  by; 


J  = 


n  —  \  —  d 
{»~2)d 


is  compared  with  the  F  distribution  with  d  and  (n-l-d)  degrees  of  freedom.  If  the 
probability  of  a  large  T’  is  sufficiently  low,  one  can  conclude,  with  a  certain  risk 
of  error,  that  the  populations  are  distinct. 

Now,  7’’  changes  as  d,  number  of  measurement  or  feature  variables,  in¬ 
creases  (Liddell,  1977).  Consider  each  feature  variable  as  independent  of  every 
other  feature  variable  and  the  standardized  difference  between  the  sample  means 
is  some  constant,  k  ,  for  each  variable.  This  allows  D’  =  k^d  and  thus; 

y  =  («  —  1  —  d)n^n2k^dl(n  —  2)dn 

—  — '/ - - d  -  —  .  1_44J 

(n  ~  2)n  {n  —  2)n 

[44]  is  a  linear  function  of  d,  decreasing  as  d  increases. 

Van  Ness  and  Simpson  (1976)  and  Van  Ness  (1979)  studied  the  rate  at 
which  7)’  must  increase  as  d  increases  in  order  to  maintain  a  constant  or  decreas¬ 
ing  error  rate.  They  analyzed  data  from  normal  populations  and  compared  three 
parametric  and  two  non-parametric  classification  algorithms.  For  each  classifier, 
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they  produced  plots  to  determine  the  discriminatory  power  lost  by  increasing  d 
with  fixed,  and  how  much  must  increase  in  order  to  justify  increasing  d. 
Their  results  showed  that  the  non-parametric  algorithms  were  quite  stable  at  high 
dimensions,  and  also  outperformed  the  parametric  algorithms  at  smaller  dimen¬ 
sions.  Nevertheless,  feature  extraction  is  necessary  to  “squeeze”  the  most  infor¬ 
mation  from  a  stochastic  process  with  the  least  amount  of  variables.  Some 
existing  feature  extraction  approaches  were  examined  for  use  in  this  work. 

4.3  Existing  Feature  Extraction  Approaches 

Algorithmic  approaches  for  finding  a  feature  space  spanned  by  a  subset 
of  the  original  measurement  space  are  categorized  into  two  major  areas:  selection 
and  transformation.  Feature  variable  -selection  is  appropriate  if  cost  or  other 
factors  present  prevent  all  of  the  original  .set  of  features  to  be  measured  and  used; 
it  is  a  combinatorial  analysis  problem.  When  all  the  variables  can  be  measured, 
variable  transformation  is  performed  but  increased  reliability  occurs  if  a  lower  di¬ 
mensional  space  is  used.  Variable  transformation  approaches  include  linear  and 
non-linear  techniques.  Both  approach  categories  as.sume  the  number  of  potential 
features  is  much  less  than  the  number  of  training  samples.  This  was  not  the  case 
with  experimental  time  series  cases  analyzed  in  this  research.  Consequently,  a 
hybrid  approach  was  developed  in  this  work.  Before  describing  this  hybrid  ap¬ 
proach,  existing  selection  and  transformation  methods  are  given  as  some  of  their 
aspects  arc  incorporated  in  the  IIOS  feature  extraction  process. 

-Selection  of  a  sub.sct  from  the  complete  set  of  variables  is  approached 
with  exhaustive  search,  branch  and  bound,  and  stepwise  methods.  Fxhaustivc 
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search  methods  are  only  feasible  when  d  is  quite  small.  The  major  problem  with 
exhaustive  search  methods  is  how  to  test  the  many  possible  sets  without  the  large 
number  of  tests  invalidating  the  significance  level  of  each  test.  Branch  and  bound 
algorithms  accelerate  the  search  of  all  variable  sets  but  do  not  explicitly  evaluate 
all  of  them.  Branch  and  bound  is  usually  used  on  problems  where  the  number  of 
possibilities  evaluated  increases  exponentially  with  some  fundamental  parameter 
of  the  problem.  Unfortunately,  even  though  branch  and  bound  techniques  slow 
down  the  growth  rate  of  possibilities,  it  remains  exponential.  Thus,  suboptimal 
search  methods  such  as  sequential  forward  selection  and  backward  elimination 
approaches  are  also  used.  Kittler  (1978)  gives  empirical  comparisons  of  these  two 
stepwise  methods  and  extensions  such  as  his  generalized  plus  1-take  away  r  se¬ 
lection  algorithm.  This  approach  finds  the  particular  1-dimensional  subset  of 
those  variables  not  yet  added  which,  when  combined  with  the  current  set,  leads 
to  the  greatest  J  statistic  [44].  Then  each  step  examines  the  selected  set  to  identify 
those  r  variables,  when  discarded,  reduce  J  by  the  least.  His  general  conclusions 
were  that  selection  and  backward  selection  methods  which  select/reject  several 
variables  simultaneously  were  better  than  methods  which  select/rcject  one  variable 
at  a  time.  Additionally,  forward  selection  of  two  variables  and  backward  deletion 
of  one  variable  gave  the  best  results  and  was  computationally  favorable  with 
branch  and  bound  methods.  Since  stepwi.se  methods  could  continue  indefinitely 
if  computation  time  is  not  a  constraint,  stopping  criteria  such  as  a  test  statistic 
given  by  Rao  (1970)  tests  whether  an  extra  {d  ~  cf)  variables  makes  a  significant 
contribution  to  the  discrimination  ta.sk. 

Variable  transformation  methods  include  canonical  discriminant  analysis 
((M)A)  which  finds  a  set  of  axes  spanning  a  subspacc  of  the  complete  space  where 
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class  separability  is  maximized.  CDA  is  done  in  a  similar  fashion  as  principal 
component  analysis  (PCA)  for  summarizing  total  variation.  But,  with  PCA,  only 
one  data  set  is  analyzed,  while  CDA  analyzes  at  least  two  data  sets.  PCA  also 
subtracts  means  so  it  is  an  analysis  of  variance/  covariance.  CDA  also  does  a 
PCA  of  class  variable  means.  Variables  used  for  canonical  discriminant  compu¬ 
tation  need  to  have  an  approximate  multivariate  normal  distribution  within  each 
class  and  a  common  covariance  matrix.  However,  a  linear  discriminant  boundary 
may  be  determined  by  a  least  squares  argument  without  the  assumption  of 
normality  and  common  dispersions  of  the  two  parent  distributions  (Kendall  et  al., 
198.1).  If  it  is  the  case  that  the  multivariate  normality  assumption  is  unjustified, 
non-linear  feature  extraction  methods  have  also  been  developed  (Fukunaga  and 
Ando,  1977), 

4.4  [New  Hybrid  Approach 

When  the  number  of  potential  feature  variables  is  much  greater  than  the 
sample  size,  a  hybrid  approach  that  attacks  such  problems  in  stages  is  necessary 
(Jain  and  Dubes,  1978).  The  HOS  feature  extraction  algorithm  is  composed  of 
three  stages.  First,  visual  plots  of  ensemble  averaged  spectra  and  their  differences 
between  groups  are  generated  after  each  respective  spectral  estimation  process  to 
obtain  a  rough  idea  of  which  estimates  to  use  as  possible  feature  variables.  This 
is  the  graphical  variable  selection  stage,  llinich  and  Clay  (1968)  describe  the  gen¬ 
eral  procedures  followed  for  frequency  domain  estimation  of  a  time  series  record. 
Jhe  statistical  tests  of  linearity  and  Ciaussianity  of  a  time  series  (llinich,  1982)  arc 
extended  for  use  with  second-order  cumulant  spectrum  estimates.  The  modulus 
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of  the  second-order  cumulant  spectrum  and  bispectrum  estimates  are  statistically 
transformed  to  chi-square  values  for  subsequent  use  as  feature  measurements. 
Hinich  (1989)  describes  the  statistical  transformation  process  for  bispectra  moduli. 
All  types  of  spectral  estimates  are  ensemble  averaged  for  the  frequency  variates 
of  the  particular  spectral  function  after  estimation  of  all  time  series  records  used 
for  training  function  computation  is  performed.  Second,  univariate  analyses  of 
variance  are  performed  and  the  resulting  F-statistics  are  plotted  for  the  corre¬ 
sponding  frequency  principal  domains  of  each  spectra  type  to  confirm  visual  dif¬ 
ferences  seen  in  the  ensemble  averaged  plots.  Spectral  variables  shown  to  be  good 
candidates  for  the  feature  set  are  selected  based  on  their  F  value.  Only  the  top  ten 
of  each  spectra  type  are  chosen  as  potential  feature  variables.  This  second  stage 
is  a  dimension  reduction  step  to  reduce  each  individual  spectral  space  to  represen¬ 
tative  variables.  Third,  a  conventional  variable  selection  algorithm,  stepwise  se¬ 
lection  of  variables  available  in  SAS  6.0,'  a  statistical  analysis  software  package, 
is  applied  using  the  thirty  spectral  variables  identified  from  the  second  stage. 
Stepwise  discriminant  analyses  are  performed  to  obtain  the  best  linear  (power 
spectrum)  discriminators,  the  best  nonstationary  (second-order  cumulant  spec¬ 
trum),  the  best  quadratic  (bispectrum)  discriminators,  the  best  linear  and  nonsta¬ 
tionary  discriminators,  etc.,  so  the  important  relationships  of  the  reduced  and 
combined  spectral  feature  space  are  considered.  The  “optimum"  individual  spec¬ 
tral  feature  sets  composed  of  ten  potential  feature  variables,  either  power  spec¬ 
trum,  hispectrum,  or  second-order  cumulant  .spectrum  feature  .sets,  are  found  to 
be  average  discriminators  by  themselves.  However,  when  the  different  spectral 
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feature  sets  are  combined  according  to  certain  statistical  criteria,  their  discrimi¬ 
nation  and  classification  power  significantly  increase  (see  Chapter  5). 

This  hybrid  feature  extraction  approach  generates  ensemble  averaged 
plots,  F-test  plots,  and  “optimum”  feature  sets.  Once  feature  extraction  for  the 
various  types  and  combinations  of  spectra  is  complete,  marginal  and  sensitivity 
studies  are  conducted  on  simulated  and  actual  time  series  to  test  and  evaluate  the 
different  approaches. 

Before  the  MOS  feature  sets  are  presented  and  discussed  for  the  simulated 
and  actual  experimental  data,  statistical  test  results  of  raw  time  series  from  the 
actual  wear  database  are  given.  Bispectrum  statistical  tests  are  employed  to  in¬ 
vestigate  if  the  observed  time  series  records  are  consistent  with  the  hypothesis  that 
the  underlying  stochastic  process  has  a  Gaussian  distribution,  and  whether  the 
process  contains  evidence  of  nonlinearity  in  the  underlying  physical  mechanisms 
generating  the  observed  vibrations.  The  sample  bispectrum  is  the  two  dimensional 
Fourier  transform  of  the  expected  value  of  the  vibration  signal  at  three  time 
points,  and  should  be  a  standardized  normal  random  variable  if  the  process  is 
stationary,  linear,  and  Gaussian  (Hinich,  1982).  Shown  at  Table  4.1  on  page  92 
and  Table  4.2  on  page  92  are  the  re.sults  from  applying  the  Hinich  linearity  and 
gaussianity  te.sts  to  the  two  bit  cla.sses  for  each  stack/load  time  scries. 
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Table  4.1 

Gaussianity  and  Linearity  Test  Statistic  Results  For  New  Bits-Actual  Wear  Ex¬ 
periment.  The  Z  statistic  is  a  normal  approximation  of  the  central  chi-squared 
variate  with  large  degrees  of  freedom.  It  is  a  N(0,1)  random  variable  under  the  null 
hypotheses  of  a  Gaussian  and  a  linear  process. _ 


Time  Series 
(Stack/Load) 

Gaussianity  Statistic  (Z) 
Mean  Std 

Linearity  Statistic  (Z) 
Mean  Std 

NlP/3 

54.2 

39.7 

54.8 

42.4 

NlP/4 

3.56.4 

212.9 

367.2 

216.2 

6S2P/3 

40.8 

32.1 

40.2 

35.8 

6S2P/4 

186.9 

183.9 

192.4 

189.4 

Table  4.2 

Gaussianity  and  Linearity  Test  Statistic  Results  For  Slightly  Used  Bits--Actual 
Wear  Experiment.  The  Z  statistic  is  a  normal  approximation  of  the  central  chi- 
squared  variate  with  large  degrees  of  freedom.  It  is  a  N(0,l)  random  variable  under 
the  null  hypotheses  of  a  Gaussian  and  a  linear  process. _ 


Time  Series 
(Stack/Load) 

Gaussianity  Statistic  (Z) 
Mean  Std 

Linearity  Statistic  (Z) 
Mean  Std 

NIP/3 

105.4 

67.4 

108.4 

69.3 

NlP/4 

3.58.7 

326.1 

367.0 

332.4 

6S2P/3 

56.8 

43.9 

.56.5 

47.9 

6S2P/4 

237.3 

229.3 

229.9 

192.8 

These  global  test  statistics  show  that  the  drill  spindle  vibration  time  scries 
for  the  Z  accelerometer  are  definitely  non-Gaussian  and  nonlinear  for  both  new  and 
slightly  used  drill  bits.  Also,  the  cn.scmblc  averages  and  standard  deviations  of 
both  statistical  measures  arc  higher  for  slightly  used  bits  than  new  hits.  Possibly, 
the  increased  nonlinear  and  non-Gaussian  structure  of  slightly  used  bit  spindle 
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vibrations  are  due  to  bit  wear  mechanisms  such  as  flank  or  rake  wear  changing  the 
geometry  of  the  bit  cutting  surface  and  causing  the  thrust  forces  to  increase. 
Possibly  incipient  bit  wear  causes  more  significant  frequency  coherence  at  certain 
frequency  components.  ALso  Table  4.1  on  page  92  and  Table  4.2  on  page  92  re¬ 
veal  as  more  panel  stack  material  is  cut  with  each  revolution  of  the  drill  (4  mil/rev 
versus  3  mil/rev),  higher  statistical  values  are  obtained  which  correspond  to  the 
increased  “strength”  of  the  interacting  frequency  components.  Thus,  these  global 
statistical  measures  are  corresponding  to  the  actual  physics  of  the  circuit  card 
cutting  process. 

4.5  Results 

Hven  though  th  iobal  Ilinich  statistical  measures  are  indicative  of  class 
distinguishability,  feature  extraction  emphasis  is  on  the  statistical  selection  of 
particular  linear,  nonstationary,  and  nonlinear  spectrum  estimates  for  subsequent 
input  to  an  efficient  classification  algorithm.  The  ultimate  aim  is  to  reveal  features 
of  a  consistent  relationship  that  have  good  classification  performance.  The  se¬ 
lected  features  can  then  potentially  provide  a  deeper  understanding  of  the  physics 
of  a  particular  physical  process  under  study.  Hence,  the  desired  properties  of  ex¬ 
tracted  features  in  order  of  priority  are; 

1.  consistent  classification  performance. 

2.  physically  interpretable  with  some  correspondence  to  the  physics  of  the 
stochastic  process,  and 

3.  good  visual  discrimination  ability. 
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Feature  extraction  results  are  given  generally  for  the  simulation  scenarios 
and  then  details  for  all  the  database  partitions  of  the  actual  experiment  are  given 
in  tabular  format.  (The  simulated  and  actual  wear  experiments  are  described  in 
Chapter  .5).  Feature  extraction  results  will  confirm  the  research  hypothesis  that 
HOS  features,  and  particularly  estimates  of  the  second-order  cumulant  spectrum 
not  part  of  the  purely  periodic  support  grid  within  the  2-CSPD  region  provide 
better  features  for  incipient  wear  characterization. 

Simulation  feature  extraction  results  for  all  the  scenarios  (sec  Table  .5.1 
on  page  11.5)  are  summarized  as  particular  frequency  values  do  not  have  any 
physical  meaning.  The  final  I  lOS  extracted  feature  sets  for  all  simulated  scenarios 
were  composed  of  twenty-eight  power  spectrum,  fifty-one  bispcctrum,  and 
twenty-five  cumulant  spectrum  feature  variables.  For  each  particular  scenario,  the 
number  of  HOS  features  was  larger  than  the  number  of  power  spectrum  features 
and  had  a  higher  staf-  '  al  significance  level.  Most  significantly, of  the 
twenty-five,  or  84  percent,  of  the  second-order  cumulant  spectrum  variables  were 
off  the  pure  cyclostationary  support  grid  in  the  2-(.'SPD.  These  features  were  also 
in  the  middle  to  highest  ranges  of  statistical  significance  with  relation  to  the  other 
features  selected,  fhus,  evidence  from  simulations  weighs  in  favor  that  incipient 
wear  characterization  is  enhanced  by  performing  cumulant  spectrum  estimation 
and  feature  extraction.  1  he  real  test  of  the  hypothesis  will  he  exarninatinn  of 
feature  extraction  results  from  actual  wear  data. 

Visual  inspection  of  each  type  of  cn.scmbic  averaged  spectral  plots  for  the 
two  classes  of  drill  bits  give  a  preliminary  look  of  which,  frequency  variates  are  bit 
class  distinguishable.  .Sec  Figure  4.1  on  page  96  and  Idgure  4.2  on  page  97  for 
ensemble  averaged  power  spectrum  plots  and  ensemble  averaged  bispcctrum 
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chloropleth  plots  for  new  drill  bits  for  a  particular  case  of  the  actual  wear  data 
described  in  Chapter  5.  Also,  see  Figure  4.3  on  page  98  and  Figure  4.4  on  page 
99  for  ensemble  averaged  power  spectrum  and  ensemble  averaged  bispectrum 
plots  for  slightly  used  drill  bits  of  the  same  case. 

These  individual  plots.  Figure  4.1  on  page  96  and  Figure  4.3  on  page 
98,  and  Figure  4.2  on  page  97  and  Figure  4.4  on  page  99,  of  the  different  drill  bits 
are  combined  so  that  dilTerences  in  spectrum  estimates  for  the  two  groups  are 
more  visually  apparent.  Sec  Figure  4..S  on  page  100  and  Figure  4.6  on  page  101 
for  representations  of  the  differences  in  ensemble  averaged  power  spectrum  and 
ensemble  averaged  bispectrum. 

Overlaying  the  representations  of  the  two  ensemble  power  spectrums  and 
their  variability  serves  its  purpose  as  a  preliminary  look  at  what  range  of  fre¬ 
quencies  are  bit  class  distingui  hable.  Differences  in  ensemble  averaged  power 
spectrum  plots  for  all  four  stack/chip  load  cases  were  quite  similar  as  that  shown 
in  Figure  4.5  on  page  100.  Power  spectra  exhibited  the  presence  of  strong  spec¬ 
trum  peaks  at  frequencies  below  5  kllz.  These  peaks  occurred  at  the  shaft  spindle 
rotational  frequency,  /^,  and  its  harmonics,  2f„  and  4f„  and  reflect  the  periodic 
cutting  forces  due  to  hardness  differences  of  the  glass  and  epoxy  material  in  the 
circuit  card  layers.  Most  of  the  signal  content  occurs  at  the  harmonic  frequencies 
and  do  not  appear  useful  as  out.standing  wear  indicators.  However,  there  are  two 
frequency  ranges  which  visually  appear  useful  as  potential  wear  indicators;  fre¬ 
quency  values  near  one-half  the  fundamental  rotational  frequency  of  the  drill 
spindle,  .,5/„  and  between  14-14.5  kllz.  Other  researchers  have  noted  this  sub¬ 
harmonic  structure  with  journal  bearings  in  high-speed  turbomachinery,  some¬ 
times  referring  to  it  as  a  whirl  frequency  (Braun,  1986).  This  may  be  due  to  le.ss 
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Figure  4.3:  Ensemble  Averaged  Power  Spectnjm-NIP3  Case(Slightly  Used 
Drills).  No  smoothing  performed  with  a  block  size  of  800 
samples. 
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Figure  4.4:  En.semble  Averaged  Bispectrum— NIP3  Case  (Slightly  Used 
Drills).  Ensemble  averaged  chi-square  statistical  measures  arc 
transformations  of  bispectra  moduli  or  squared  gain.  Legend 
defined  shows  the  magnitude  of  the  chi-square  statistic  denoting 
frequency  interaction  strength.  No  smoothing  performed  with  a 
sample  block  size  required  to  capture  two  integral  periods  of  the 
process. 
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Figure  4.5:  Ensemble  Averaged  Power  Spectrum  DjfTerenceS”NIP3 
Case.  New  bit  ensemble  average  is  denoted  by  squares  and 
slightly  used  bit  ensemble  average  by  circles.  Small  dashes  arc  95 
percent  confidence  limits  for  new  and  larger  dashes  arc  those  for 
slightly  used.  No  smoothing  performed  with  a  block  size  of  800 
samples. 
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Figure  4.6:  Ensemble  Averaged  Bispectrum  Differences— NFP3 

Case.  Differences  in  ensemble  averaged  chi-square  statistical 
measures  are  transformations  of  bispectra  moduli  or  squared  gain. 
Legend  defined  shows  the  magnitude  of  the  chi-square  statistical 
differences  of  frequency  interactions.  No  smoothing  performed 
with  a  sample  block  size  required  to  capture  two  integral  periods 
of  the  process. 
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frictional  forces  of  a  slightly  worn  drill  bit.  The  higher  range  of  frequency  values 
are  near  a  torsional  resonant  frequency  of  the  drill  spindle.  As  the  top  portion  of 
the  spindle  rotates  in  one  direction,  its  body  rotates  the  opposite  direction  (see 
Figure  5.3  on  page  127).  The  spindle  rotation  causes  the  drill  bit  to  slightly  go 
up  and  down  during  the  drilling  process.  It  appears  the  spindle  torsional  mode  is 
more  strongly  excited  by  new  drills  than  slightly  worn  drills.  A  decaying  torsional 
oscillation  excited  by  contact  of  worn  cutting  surfaces  of  the  drill  is  physically  in¬ 
tuitive. 

The  bispectrum  difference  chloropleth  plots  clearly  show  the  general  re¬ 
gions  and  the  particular  frequency  interaction  pairs  that  are  class  distinguishable. 
Differences  in  ensemble  averaged  bispectrum  chloropleth  plots  for  all  four 
stack/chip  load  cases  are  similar  to  Figure  4.6  on  page  101.  Differences  of  en¬ 
semble  bispectrum  chi-square  values  first  show  drill  class  distinguishability  in  fre¬ 
quency  interaction  regions  composed  of  first  through  the  eighth  harmonics  of  the 
fundamental  rotational  frequency  of  the  drill  spindle  with  frequencies  greater  than 
14  kllz.  A  portion  of  this  significant  different  frequency  structure  may  be  due  to 
parametric  coupling  of  the  torsional  resonant  frequency  with  each  of  its  lower 
harmonics.  This  fact  is  significant  as  Ramirez  (1991)  discovered  from  his  analysis 
of  extended  drill  wear  data  that  the  fifth  through  the  eighth  power  spectrum  har¬ 
monics  of  the  same  accelerometer  (Z  or  thrust  axis)  are  the  most  sensitive  drill 
wear  indicators.  Thus,  a  predictive  capability  may  have  been  demonstrated  with 
bispectrum  analysis  of  the  incipient  wear  data.  Also  significant  is  that  for  all  the 
stack/load  cases,  bispectrum  c.stimates  are  most  significantly  different  in  the  outer 
triangle  (OT)  region  of  the  bispectrum  principal  domain.  This  was  evidence  and 
motivation  for  further  investigation  with  cumulant  spectrum  estimation  and  fea- 
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ture  extraction  as  it  meant  the  existence  of  a  nonstationary  generating  source  in 
the  stochastic  process  (Hinich,  1989). 

extracted  second-order  cumulant  spectrum  features  were  consistently  not 
on  the  pure  cyclostationary  support  grid  in  the  2-CSPD  which  confirms  the  major 
theoretical  proposition  stated  in  Chapter  3.  Furthermore,  they  were  always  the 
most  significant  feature  variables  for  all  but  one  of  the  eight  actual  wear  database 
partition  cases  (NIP4).  Significantly,  the  NIP4  is  the  only  database  partition 
where  no  overall  marginal  improvement  in  discrimination  and  classification  power 
was  obtained  by  incorporating  HOS  feature  information.  This  fact  adds  further 
evidence  that  better  incipient  wear  characterization  is  provided  with  2-CSPD  es¬ 
timates  off  the  cyclostationary  support  grid.  Consider  the  following  physical  ex¬ 
planation  why  these  statistical  correlations  discovered  by  the  feature  extraction 
algorithm  are  most  important.  Cards  were  manufactured  in  the  same  facility  with 
the  same  resin  system  but  had  different  glass  cloth  and  layer  thicknesses.  See 
Figure  4.7  on  page  104  for  two  examples  of  card  construction.  Because  the  glass 
fibers  (oval  disks  in  diagram)  cut  during  each  hole  are  not  uniformly  configured 
in  the  card  layers,  the  vibration  signals  will  have  periodic  and  aperiodic  charac¬ 
teristics  and  reflect  the  effects  of  many  different  cutting  geometries  randomly  en¬ 
countered  by  the  drill.  So  the  cutting  forces  and  energy  represented  by  the 
vibration  measurements  change  within  a  certain  layer  of  the  card  and  also  for  each 
revolution  of  the  drill.  Vibration  measurements  carrying  wear  information  of  the 
drill  cutting  edges  will  thus  be  more  sensitive  to  spectral  correlations  that  are  not 
integer  multiples  of  the  fundamental  rotation  of  the  drill  spindle.  Fxtracted  fea¬ 
tures  shown  in  the  following  tables  reveal  the  importance  of  second-order 


cumulant  estimation. 


104 


Table  4.3 


Actual  Experiment  Feature  Extraction--NIP/3  Case.  Scum  denotes  the  second- 
order  cumulant  spectrum,  Bisp  denotes  the  bispectrum,  and  Spec  denotes  the  power 
spectrum.  Feature  variables  are  listed  in  the  order  entered  by  the  SAS  variable  se¬ 
lection  algorithm.  Cyclic  frequency  of  the  drill  spindle  is  764  Hz. 


Frequency 
Value  (Hz) 

Spectrum 

Type 

F-Stat 

Value 

Off  2-CSPD 

Periodic  Grid? 

9550,1910 

Scum 

16 

yes 

12224,1528 

Scum 

13 

no 

6876,8404 

Bisp 

8.5 

n/a 

14520 

Spec 

7.6 

no 

764,11460 

Bisp 

5.4 

n/a 

1528,15281 

Bisp 

6.7 

n/a 

8608 

Spec 

7.9 

yes 

13290 

Spec 

3.0 

yes 

13190 

Spec 

4.2 

yes 
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Table  4.4 

Actual  Experiment  Feature  Extraction-'6S2P/3  Case.  Scum  denotes  the  second- 
order  cumulant  spectrum,  Bisp  denotes  the  bispectrum,  and  Spec  denotes  the  power 
spectrum.  Feature  variables  are  listed  in  the  order  entered  by  the  SAS  variable  se¬ 
lection  algorithm.  Cyclic  frequency  of  the  drill  spindle  is  764  ilz. _ 


Frequency 
Value  (Hz) 

Spectrum 

Type 

i'-Stat 

Value 

Ofr2-CSPD 

Periodic  Grid? 

8404,1910 

Scum 

17.3 

yes 

8786,1528 

Scum 

9.3 

yes 

4584,15281 

Bisp 

6.4 

n/a 

13520 

Spec 

5.8 

yes 

13060 

Spec 

4.6 

yes 

30.56,25213 

Bisp 

3.1 

n/a 

,5425 

Spec 

3.9 

yes 

4508 

Spec 

6.8 

yes 

9168,22157 

Bisp 

4.0 

n/a 

12988,1146 

Scum 

3.4 

yes 

Table  4.5 


Actual  Experiment  Feature  E.xtraction— NIP/4  Ca.se.  Scum  denotes  the  second- 
order  cumulant  spectrum,  Bisp  denotes  the  bispectrum,  and  Spec  denotes  the  power 
spectrum.  Feature  variables  are  listed  in  the  order  entered  by  the  SAS  variable  se¬ 
lection  algorithm.  Cyclic  frequency  of  the  drill  spindle  is  588  Hz. 


Frequency 
Value  (Ilz) 

Spectrum 

Type 

F-Stat 

Value 

Off  2-CSPD 

Periodic  Grid? 

12936,1176 

Scum 

15 

no 

1 1466,588 

Scum 

10 

yes 

14700,2058 

Scum 

10 

yes 

2353,123.54 

Bisp 

6 

n/a 

1765.8236 

Bi.sp 

6 

n/a 

4706,11766 

Bisp 

4 

n/a 

I 


Table  4.6 

Actual  Experiment  Feature  Extraction--6S2P/4  Case.  Scum  denotes  the  second- 
order  cumulant  spectrum,  Bisp  denotes  the  bispectrum,  and  Spec  denotes  the  power 
spectrum.  Feature  variables  are  listed  in  the  order  entered  by  the  SAS  variable  se¬ 
lection  algorithm.  Cyclic  frequency  of  the  drill  spindle  is  588  Hz. 

Frequency  Spectrum  F-Stat  Off  2-CSPD  | 

Value  (I Iz)  Type  Value  Penouic  Grid? 


Frequency 
Value  (Hz) 

Spectrum 

Type 

F-Stat 

Value 

6468,1470 

Scum 

9.1 

9413,10589 

Bisp 

7.7 

12870 

Spec 

3.6 

2574 

Spec 

4.3 

14494,2352 

Scum 

4.5 

6471,8824 

Bisp 

6.2 

12642,1764 

Scum 

3.2 

9702,1470 

Scum 

4,0 

Table  4.7 

Actual  Experiment  Feature  Extraction— Combined  Load  3  Ca.se.  Scum  denotes  the 
second-order  cumulant  spectrum,  Bisp  denotes  the  bispectrum,  and  Spec  denotes 
the  power  spectrum.  Feature  variables  arc  listed  in  the  order  entered  by  the  SAS 
variable  selection  algorithm.  Cyclic  frequency  of  the  drill  spindle  is  764  Hz. 


Frequency 
Value  (Hz.) 

Spectrum 

Type 

F-Stat 

Value 

Ofr2-C5 

Periodic 

8404,1910 

Scum 

36 

yes 

13370,12988 

Scum 

11 

yes 

3056,9932 

Bisp 

6.0 

n/a 

14870 

Spec 

4.5 

yes 

6876,11460 

Bisp 

.5.4 

n/a 

6112,12988 

Bisp 

4.4 

n/a 

1528,1528 

Bisp 

3.6 

n/a 

8455 

Spec 

4.4 

yes 

95.50,1910 

Scum 

11 

yes 

I 
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Table  4.8 

Actual  Experiment  Feature  Extraction— Combined  Load  4  Case.  Scum  denote.s  the 
second-order  cumulant  spectrum,  Bisp  denotes  the  bispectrum,  and  Spec  denotes 
the  power  spectrum.  Feature  variables  are  listed  in  the  order  entered  by  the  SAS 
variable  selection  algorithm.  Cyclic  frequency  of  the  drill  spindle  is  588  Hz. 


Frequency 
Value  (Hz) 

Spectrum 

Type 

F-Stat 

Value 

01T2-CSPD 

Periodic  Grid.'' 

9702,2646 

Scum 

13.5 

yes 

4410,1176 

Scum 

8.5 

yes 

12642,1764 

Scum 

4.9 

yes 

7060,11178 

Bisp 

5.0 

n/a 

9702,294 

Scum 

4.2 

yes 

8236,11178 

Bisp 

5.0 

n/a 

14494,2352 

Scum 

5.3 

yes 

9354 

Spec 

4.4 

yes 

10589,13531 

Bisp 

5.3 

n/a 

7648,12354 

Bisp 

2.8 

n/a 

Table  4.9 

Actual  Experiment  Fe-lure  Extraction— Combined  Stack  NIP  Ca.se.  Scum  denotes 
the  .second-order  cui.iulant  spectrum,  Bisp  denotes  the  bi.spectrum,  and  Spec  de¬ 
notes  the  power  spectrum.  I'eature  variables  are  listed  in  the  order  entered  by  the 
SAS  variable  selection  algorithm,  ('yclic  frequency  of  the  drill  spindle  is  588  llz  and 
764  117.. 


Frequency 
Value  (llz) 

Spectrum 

fype 

F-Stat 

Value 

Off  2-CSPD 

Periodic  G.  id? 

9408,2058 

Scum 

7.2 

yes 

9413.17061 

Bisp 

5.9 

n/a 

588,6471 

Bisp 

3.6 

n/a 

7060,15296 

Bisp 

3.5 

n/a 

12348,1470 

Scum 

3.5 

yes 

135.30 

Spec 

3.8 

no 
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Table  4.10 

Actual  Experiment  Feature  Extraction— Combined  Stack  6S2P  Case.  Scum  denotes 
the  second-order  cumulant  spectrum,  Bisp  denotes  the  bispectrum,  and  Spec  de¬ 
notes  the  power  spectrum.  Feature  variables  are  listed  in  the  order  entered  by  the 
SAS  variable  selection  algorithm.  Cyclic  frequency  of  the  drill  spindle  is  588  Hz  and 
764  I  Iz. 


Frequency 
Value  (Hz) 

Spectrum 

Type 

F-Stat 

Value 

Ofr2-CSPD 

Periodic  Grid? 

8232,2058 

Scum 

36.0 

yes 

8820,1470 

Scum 

20.0 

yes 

2941,7648 

Bisp 

13.5 

n/a 

12642,2352 

Scum 

10.5 

yes 

1765,8236 

Bisp 

10.4 

n/a 

588,13531 

Bisp 

84 

n/a 

8236,10589 

Bisp 

5.6 

n/a 

13160 

Spec 

5.3 

yes 

4706,16472 

Bisp 

5.0 

n/a 

12870 

Spec 

4.6 

yes 

12936,1176 

Scum 

4.4 

no 

13240 

Spec 

3.7 

yes 

The  evidence  of  IlOS  feature  extraction  of  actual  incipient  wear  data 
clearly  supports  that  IlOS  features  arc  significant  in  the  feature  sets  extracted  to 
define  class  differences.  Additionally,  the  theoretical  proposition  of  second-order 
cumulant  spectrum  estimates  off  the  periodic  support  grid  as  those  which  better 
characterize  incipient  faults  of  cyclostationary  processes  is  confirmed.  Now,  what 
remains  is  an  investigation  of  the  impact  of  the  extracted  IlOS  features  on  classi¬ 
fication  performance,  using  various  multivariate  classifiers  under  dirre>-ent  process 
conditions,  to  test  the  robustness  of  the  new  incipient  fault  detection  approach. 


Chapter  5 

Evaluation  of  HOS  Approach 


5.1  Introduction 

Performing  discrimination  and  classification  tasks  on  simulated  and  actual 
time  series  data  generated  from  processes  that  have  cyclostationary  characteristics 
comprises  the  test  and  evaluation  of  the  new  HOS  incipient  fault  detection  ap¬ 
proach.  Factorial  designs,  data  collected,  and  principles  behind  the  experiments 
are  described  and  then  the  discrimination  and  classification  results  using  different 
feature  extraction  sets  are  given.  Probability  of  false  alarm  and  probability  of 
detection  are  the  measures  of  effectiveness  used  to  evaluate  the  r'^lative  merit  of 
the  various  approaches. 

5.2  Simulated  Wear  Experiment 

Modulation  theory  describes  how  a  pure  deterministic  signal  emitted  by 
a  periodic  force  is  transformed  into  a  signal  actually  measured  by  a  condition 
monitoring  system.  Consider  modulation  as  a  mapping  of  the  driving  force  signal 
space  to  the  measurement  signal  space.  Some  possible  mapping  factors  are  :  (1) 
internally  and  externally  generated  noi.sc,  (2)  structural  propagation,  (3)  change  in 
process  state,  and  (4)  change  in  process  environment.  The  transformation  of  an 
original  driving  force  signal  is  equivalent  to  a  translation  of  spectra.  A  pure  sine 
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wave  tone  is  a  delta  function  or  a  single  spike  in  spectral  representation.  A  change 
in  process  environment  (i.e.  cutting  forces  required  to  cut  through  various 
strengths  and  types  of  material)  will  cause  variation  in  frequency  and  amplitude, 
but  the  spectral  components  which  specify  the  cutting  force  process  dynamics  are 
translated  without  any  change  to  their  relative  energy  distribution--the  peak 
magnitude  is  decreased  but  the  sidebands  correspondingly  increase  to  compensate 
the  energy  loss.  However,  a  change  in  process  state  will  cause  variation  or  mod¬ 
ulation  of  phase  which  will  generate  new  frequencies  with  a  different  energy  dis¬ 
tribution  of  the  signature  signal  spectral  components.  Parameters  of  the 
simulation  experiment  empha.sized  changes  in  phase,  rather  than  changes  in  am¬ 
plitude,  and  its  impact  on  classification/feature  extraction  performance  for  several 
sets  of  process  state  and  process  environment  parameters. 

When  the  driving  force  of  a  physical  system  is  periodical  (eg.,  in  rotating 
machinery),  the  signature  signal  emitted  by  the  process  and  received  by  sensors 
may  be  represented  by  a  harmonic  process  (Priestley,  1986).  Consider  a  rotating 
drill  machine  and  the  process  of  cutting  holes  in  electronic  circuit  cards  which  is 
the  actual  wear  experiment  analyzed  later  in  this  chapter.  Assume  the  signature 
signal  is  a  vibration  time  series  sensed  by  accelerometers.  The  harmonic  process 
model  (11PM)  in  this  case  is: 


k 

V,{t)  =  '^/i„cos{2nnf,t  +  (t>„)  +  n{l)  [d."!] 

n  =  0 

where  VJt)  is  the  voltage  of  the  cosine  wave  carrier  signal;  A  and  (/>  arc  the  am¬ 
plitude  and  phase  terms  of  the  driving  force  mechanism  (drill  rotation)  or  carrier 
signal;  /  is  the  fundamental  carrier  frequency  dcterm'ned  by  the  period  of  the 
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driving  force  function  and  vibration  characteristics  of  the  machine  system;  and 
n(i)  is  the  corrupting  noise  generated  by  other  vibratory  sources  and  distortional 
effects.  Noise  is  assumed  Gaussian  and  independent  to  the  emitted  vibration  sig¬ 
nal  and  k  is  the  number  of  interacting  sinusoids.  Thus,  the  observed  periodic 
voltage  time  series  record  is  described  by  sums  of  sine  and  cosine  waves  whose 
amplitudes  and  phases  are  chosen  to  give  the  best  fit  to  the  observational  data. 
The  decomposition  of  the  periodic  time  signal  is  found  by  obtaining  the  Fourier 
series  of  the  time  scries  record. 

If  {(/»„,  rt  =  0,  1, 2, ...  }  are  identical  and  independent  uniformly  distributed 
variables  on  (  -  re,  n),  { Vc(l),  t  ^  0}  is  stationary  no  matter  what  frequency  and 
amplitudes  are  selected  to  represent  the  voltage  time  series.  Furthermore,  both 
the  autocorrelation  and  autocovariance  functions  of  a  11PM  consist  of  a  sum  of 
cosine  terms  and  therefore  never  die  out  in  contrast  to  moving  average  (MA)  and 
autoregressive  (AR)  processes.  Thus,  finite  dependence  or  finite  memory  where 
joint  random  variables  are  highly  correlated  when  the  time  instants  are  close  to¬ 
gether,  and  low  correlation  when  the  time  instants  are  widely  separated,  is  not 
applicable  to  a  FIRM  representation  of  stochastic  processes,  (sec  Appendix  B  for 
stationarity  of  flPM  and  inapplicability  of  finite  memory). 

In  this  discussion,  consider  the  cosine-wave  carrier  signal  as  referring  to 
[d.S].  Its  amplitudes,  and  phases.  <(>„,  can  be  varied  according  to  modulating 
or  inr'^''mation  signals  representing  physical  wear  processes.  The  cosine-wave 
carrier  signal  is  amplitude  and  pha.se  modulated  once  the  rotating  drill  begins  to 
wear  due  to  its  drilling  holes  in  electronic  panels.  The  inherent  energy  will  fluc¬ 
tuate,  or  he  amplitude  modulated,  due  to  the  change  in  drill  bit  surface  contact 
pressure  and  force  at  each  cutting  revolution  bccau.se  of  differences  in  hardness 


112 


and  thickness  of  panel  materials.  Additionally,  as  more  holes  are  drilled  over  time, 
the  drill  bit  cutting  edges  will  wear  and  cause  phase  modulation  of  the  baseline  or 
reference  (no  wear)  voltage  signal.  Consider  the  situation  where  k  is  just  1. 

The  cosine-wave  carrier  signal,  [45],  or  the  inherent  energy  of  the  rotating 
drill  spindle  is  amplitude  modulated  due  to  its  periodic  nature.  A  cosine  wave  with 
fluctuating  amplitude  is  also  known  as  the  phenomenon  of  beats.  By  superim¬ 
posing  two  cosine  waves  with  nearly  equal  frequencies,  to  +  Sea,  the  result  is: 

cos((u  +  S(o)t  -b  cos((u  —  Sfa)l  =  2  cos  ot  cos  Seat. 

This  oscillates  at  the  average  frequency,  o)  =  2nf„  but  the  amplitude  changes 
slowly  according  to  the  modulating  function,  2  cos  Seat.  So  the  amplitude  modu¬ 
lated  version  of  [45]  where  the  time  reference  is  chosen  so  the  carrier  phase  angle 
is  zero  is: 


Kmil)  =  ^[1  +  mj{t)^  cos(w^/)  +  n{t). 

Multiplying  y(t)  =  2  cos  Seat  by  cos{uKt)  causes  a  spectrum  shift  up  to  a  range  of 
frequencies  surrounding  the  carrier  frequency,  /„  and  the  addition  of  the  carrier 
term  provides  a  discrete  spectral  line  at  frequency  These  range  of  frequencies 
are  sometimes  called  the  lower  and  upper  sidebands  where  each  sideband  contains 
amplitude  and  phase  information  of  the  original  sinusoidal  signal.  Amplitude 
modulation  is  not  the  only  method  of  modulating  a  cosine-wave  carrier. 

Consider  a  frequency  modulated  system  in  which  the  frequency  of  the 
carrier  is  caused  to  vary  in  accordance  with  some  type  of  information-carrying 
signal.  This  could  be  variations  in  the  bit  cutting  forces  due  to  rake  and  flank 
wear  and  minor  “^peed  fluctuations  of  the  drill  spindle  as  it  wears  over  time.  The 
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frequency  of  the  sine  wave  carrier  is  to,  +  k/lt)  with  /(/)  representing  the  phase 
modulating  signal  and  k  is  a  system  constant.  Expressing  the  more  general  fre¬ 
quency  modulated  carrier  in  mathematical  terms  is  difficult  because  one  can  define 
the  frequency  of  a  sine  wave  only  when  the  frequency  is  a  constant.  Strictly 
speaking,  there  is  only  the  sine  or  cosine  of  an  angle.  However,  if  the  angle  varies 
linearly  over  time,  one  can  interpret  the  frequency  as  the  derivative  of  the  angle: 

X(/)  =  cos  ff(i)  =  cos(a),.r  -I-  GJ. 

If  0(f)  does  not  vary  linearly,  the  instantaneous  radian  frequency  w,  is  the  deriva¬ 
tive  of  the  angle  as  a  function  of  time: 


/,(/)  =  cos  0(t);  w,  =  . 

This  now  agrees  with  the  usual  use  of  frequency  if  0  =  a>,f  -E  0,. 

Hence,  the  rotating  drill  spindle  does  not  generate  a  signal  that  is  a  pure 
harmonic  tone,  cos  2n/,t  =  cos  afj,  but  rather  is  an  amplitude  and  phase  modu¬ 
lation  representation  of  [45]: 

KmpmiO  =  k[l  +  mJlO]  cos(o),t  +  +  n(t)  [46] 

where  is  the  amplitude  and  phase  modulated  cosinc-wave  carrier  signal, 

m„  is  the  amplitude  modulation  index,  /(/)  is  the  amplitude  modulating  signal,  <t>, 
is  the  carrier  signal  phase,  is  the  phase  modulation  index,  and  g(t)  is  the  pha.sc 
modulating  signal.  Now.  /[/)  =  cos  mj  and  g(()  -  cos  (n,i  with  and  being  the 
frequency  of  the  amplitude  and  phase  modulating  wave,  respectively. 

(Consider  both  amplitude  and  phase  modulations  modeled  by: 
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N-  I 

*  =  0 

Hcncc,  a  zero-mean  random  fluctuation  vibration  signal  made  up  of  a  sum  of  N 
complex  sinusoids  with  each  frequency  being  described  by  complex  amplitude  A 
has  zero  complex  mean  when  there  is  no  modulation.  Once  wear  progresses,  these 
complex  amplitude  signals  contain  both  random  amplitude  and  phase  modu¬ 
lations  which  result  in  broadening  of  their  bandwidth  or  a  correction  to  the  pure 
line  spectrum,  and  A  becomes  a  nonzero  complex  random  variable.  If  mJlt)  and 
are  zero  mean,  stationary,  and  statistically  independent,  the  power  spectral 
density  of  can  be  derived  to  show  how  the  presence  of  random  amplitude 

and  phase  modulations  produce  bandwidth  broadening  (see  Appendix  C).  The 
primary  interest  for  incipient  fault  detection  is  classifying  changes  in  the  phase 
modulation  index  parameter  of  the  signature  signal  that  corresponds  to  the  degree 
of  wear  or  developing  failure  in  a  rotating  machine  process.  Details  of  the  simu¬ 
lation  experiments  are  given  next. 

Two  hundred  and  fifty  independent  classification  runs  using  three  alter¬ 
native  feature  extraction  methods  for  fourteen  treatments,  or  incipient  failure 
cases,  were  performed.  Three  simulation  parameters  or  factors  arc  changed  in  a 
most  deliberate  fashion  to  represent  fourteen  very  difficult 
discrimination/classification  problems.  These  parameters  are  amplitude  and  phase 
modulation  indicc  values,  and  standard  deviation  of  an  independent  Gaussian 
noise  term.  The  fourteen  treatments  can  be  logically  categorized  as  seven  sce¬ 
narios  shown  at  Table  .5.1  on  page  1 1.5.  liach  scenario  has  two  treatments  with 
a  correspondingly  increased  value  in  phase  modulation  associated  with  a  fixed 
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level  of  amplitude  modulation  and  independent  Gaussian  noise  standard  devi¬ 
ation.  Thus,  each  treatment  entry  of  Table  5.1  on  page  115  is  a  discrimination 
and  classification  problem  of  two  classes  of  two  hundred  and  fifty  simulated  time 
series. 


Table  5.1 

Seven  Incipient  Fault  Detection  Scenarios— Simulated  Wear  Data.  Bach  scenario 
has  two  discrimination/classification  cases.  Numbers  in  parentheses  are  the  simu¬ 
lation  parameters:  amplitude  modulation  index,  phase  modulation  index,  and 
standard  deviation  of  Gaussian  noise. 


Scenario 

Number 

Classification 

Treatment 

lA 

(.3,.7,.4)  vs  (.3,.71,.4) 

IB 

(.3, .7, .4)  vs  (.3,.72,.4) 

2A 

(.3..7,.8)  vs  (.3..71,.8) 

2B 

(.3,.7,.8)  vs  (.3..72,.8) 

3A 

(.3,.7,1.4)  vs  (.3,.71,L4) 

3B 

(.3,.7,1.4)  vs  (.3, .72, 1.4) 

4A 

(.3,.4,.4)  vs  (.3..4I,.4) 

4B 

(.3..4,.4)  vs  (.3..42,.4) 

5A 

(.3..4,.8)  vs  (.3..41.,8) 

5B 

(.3,.4,.8)  vs  (.3, .42, .8) 

6A 

(.3,.4,1.4)  vs  (.3,.4I,L4) 

6B 

(.3, .4,1.4)  vs  (.3..42,L4) 

7A 

(.5,.7,.4)  vs  (.5,.7I,,4) 

7B 

(..5,.7,.4)  vs  (.5..72,.4) 

Standard  International  Mathematical  Statistical  Library  (IMSI.)  routines  were 
used  to  generate  Gaussian  noise  and  deterministic  phase.  Levels  for  amplitude 
modulation  were  considered  fixed  as  it  rcprc.scnts  more  of  a  change  in  environ¬ 
ment  rather  than  a  change  in  process  state;  however,  amplitude  modulation  was 
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changed  for  sensitivity  and  verification  purposes.  There  were  three  phase  modu¬ 
lation  levels,  .7  was  chosen  as  the  base  reference  to  reprc.scnt  a  new  process/object 
condition,  and  .71  and  .72  represented  an  increasingly  worn  condition.  Note  that 
zero  values  for  the  modulation  indices  represent  the  pure  cosine  wave  carrier  fre¬ 
quency.  It  is  important  to  emphasize  that  parameter  selection  was  not  an  arbi¬ 
trary  process,  but  rather  required  an  iterative  parameter  verification  process  to 
ensure  the  simulated  spectral  structure  represented  incipient  fault  problem  situ¬ 
ations.  See  Figure  5.1  on  page  117  and  Figure  5.2  on  page  118  for  examples  of 
the  simulated  raw  time  series  for  two  discrimination/classification  treatments  of 
one  incipient  fault  scenario. 

In  summary,  the  simulation  experiments  generated  time  series  data  for 
marginal  analyses  to  determine  if  there  is  additional  discrimination  power  and 
classification  performance  using  features  provided  by  more  involved  spectrum  es¬ 
timation  procedures  such  as  the  bispectrum  and  second-order  cumulant  spectrum. 
Sensitivity  analyses  also  are  conducted  to  determine  the  impact  of  slight  increases 
in  phase  modulation  and  varying  levels  of  noi.se  on  each  of  the  feature  extraction 
method's  classification  performance. 
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5.2.1  Experimental  Design 

The  experimental  design  is  a  randomized  complete  block.  To  eliminate 
bias  in  the  measurement  of  the  two  major  response  variables,  probability  of  false 
alarm  and  probability  of  detection,  four  strategies  were  employed.  First,  since 
classification  performance  is  directly  related  to  the  trained  discriminant  rule  or 
function,  ten  different  training  functions  were  calculated  for  each  classification 
treatment.  Fiach  of  the  training  rules  were  constructed  from  a  random  sample  of 
thirty  out  of  a  two  hundred  and  fifty  signal  ensemble  for  each  class.  .Second,  a 
jacknife  error  estimation  process  described  in  the  Chapter  2  was  followed  for 
computing  classification  results  for  a  particular  classification  run.  Third,  as  an 
additional  safety  measure  to  properly  and  fairly  compare  feature  extraction 
methods,  a  paired  comparison  T-test  analysis  approach  was  followed  to  eliminate 
any  classification  performance  variability  due  to  different  capabilities  of  the  ten 
training  discriminant  rules.  Lastly,  in  addition  to  training  classification,  test  clas¬ 
sification  was  conducted  to  obtain  estimates  of  actual  classification  performance. 
Parameters  of  each  generated  time  scries  were  the  following:  1178  time  samples, 
.020  seconds  for  total  record  length,  58  kilohertz  sampling  frequency,  760  hertz 
carrier  frequency,  .180  hertz  amplitude  modulating  frequency,  and  190  hertz  phase 
modulating  frequency.  For  each  simulation  treatment  or  incipient  failure  case, 
three  spectral  estimation  and  feature  extraction  methods  were  performed  for  sub¬ 
sequent  input  to  a  linear  classifier. 
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5.2.2  Results 

As  described  in  the  background  chapter  concerning  measuring  difTcrences 
in  multivariate  populations,  Wilks'  lambda  and  averaged  square  canonical  corre¬ 
lation  statistics  are  used  as  discrimination  clTectiveness  measures  for  the  training 
or  discriminant  rules  constructed  from  thirty  random  samples  for  each  class  drawn 
from  the  250  signal  ensemble  groups.  However,  since  classification  is  the  major 
objective  in  many  applications  of  discriminant  analysis,  alternative  spectral  feature 
extraction  approaches  are  best  compared  by  examining  two  major  classification 
performance  components  which  define  the  rate  of  correct  classification:  proba¬ 
bility  of  detection  and  probability  of  false  alarm.  These  classification  performance 
measures  are  reported  as  relative  comparisons  via  paired  t-tests  for  each 
sccnario/classification  treatment.  The  classification  treatments  are  specified  as 
blocks  of  simulation  parameter  triads  (amplitude  modulation  index,  phase  modu¬ 
lation  index,  and  Gaussian  noise  standard  deviation).  These  simulation  parameter 
blocks  defined  the  time  series  class.  Two  types  of  classification  performance  are 
reported:  discriminant  or  (raining  classification  and  test  classification.  Classifica¬ 
tion  results  revealed  no  significant  statistical  difference  in  classification  perform¬ 
ance  between  the  alternative  feature  extraction  methods  for  the  four  treatments 
with  a  high  (1.4)  level  of  noise  standard  deviation.  However,  for  the  ten  other 
treatments  or  five  scenarios  listed  in  labic  5.1  on  page  115,  some  interesting  re¬ 
sults  were  obtained. 
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5.2.2. 1  Discrimination 


The  marginal  discrimination  benefit  of  combining  second-order  cumulant 
spectra  features  to  power  spectra  features  is  shown  in  Table  5.2. 


Table  5.2 

Marginal  Discrimination  Benefit  of  Combining  Power  Spectrum  With  Second 
Cumulant  Spectrum  Features-Simuiated  Wear  Data.  Both  effectiveness  measures 
represent  relative  discriminating  power  of  a  specific  discriminating  function  com¬ 
puted  on  a  random  selection  of  thirty  time  series  of  each  simulated  class.  Power 
spectra  is  denoted  by  PS'  and  second-order  cumulant  spectra  is  denoted  by 
SCUM'. 


Classification 

Wilks' 

Lambda 

Squared  Canonical  Corr 

Treatment 

PS 

PS  &  SCUM 

PS 

PS  &  SCUM 

(.3, .7, .4) 

vs 

(.3..71,.4) 

.499 

.374 

..500 

.604 

(.3, .7, .4) 

vs 

(.3, .72, .4) 

.660 

.424 

.339 

.637 

(.3..7,.8) 

vs 

(.3,.71..8) 

.553 

.499 

.446 

.500 

(.3, .7, .8) 

vs 

(.3..72,.8) 

.506 

.262 

.493 

.737 

(.3, .4, .4) 

vs 

(.3, .41, .4) 

.593 

.309 

.406 

.690 

(•3..4,.4) 

vs 

(.3, .42, .4) 

.640 

.355 

.359 

.644 

bo 

vs 

(.3,.41,.8) 

.489 

.344 

.510 

.655 

(.3, .4, .8) 

vs 

(.3,.42,.8) 

.340 

.306 

.659 

.693 

(.5, .7, .4) 

vs 

(.5,.71,.4) 

.486 

.272 

.513 

.727 

(.5, .7, .4) 

vs 

(.5,.72,.4) 

.552 

.372 

.447 

.627 

Both  the  Wilks'  lambda  statistical  criterion  (lower  is  better)  and  the  averaged 
square  canonical  correlation  improved  significantly  when  nonstationary  feature 
information  is  combined  with  power  spectra  feature  information,  furthermore, 
combining  nonlinear  or  bispectra  feature  information  to  the  already  constructed 
IIOS  feature  set  provides  additional  increases  in  the  discriminant  effectiveness 


measures.  Discrimination  measures  with  this  HOS  feature  composition  compared 
to  only  power  spectra  feature  sets  are  shown  in  fable  5.3. 
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Table  5.3 

Marginal  Discrimination  Benefit  for  Combining  Bispectrum  and  Second-order 
Cumulant  Spectrum  with  Power  Spectrum  Features— Simulated  Wear  Data.  Both 
effectiveness  measures  represent  relative  discriminating  power  of  a  specific  discrim¬ 
inating  function  computed  on  a  random  selection  of  thirty  time  series  of  each  sim¬ 
ulated  class.  Power  spectra  is  denoted  by  PS',  second-order  cumulant  spectra  is 
denoted  by  SCUM',  and  bispectra  is  denoted  by  'B'. 


Classification 

Wilks' 

Lamt  .u 

Squared  Canonical  Corr 

Treatment 

PS 

PS,SCUM,B 

PS 

PS,SCIJM,B 

(.3,.7,.4) 

vs 

(.3,.71..4) 

.499 

.205 

.500 

.8.34 

(.3,.7„4) 

vs 

(.3,.72,.4) 

.660 

.172 

.339 

.867 

(.3..7,.8) 

vs 

(.3, .71, .8) 

.553 

.195 

.446 

.804 

(.3,. 7, .8) 

vs 

(.3,.72,.8) 

.506 

.131 

.493 

.868 

(.3, .4, .4) 

vs 

(.3,.41,.4) 

.593 

.128 

.406 

.871 

(.3, .4, .4) 

vs 

(.3,.42,.4) 

.640 

.172 

.359 

.828 

OC 

vs 

(.3,.41,.8) 

.489 

.198 

.510 

.801 

(.3,.4,.8) 

vs 

(.3, .42, .8) 

.340 

.122 

.659 

.878 

{.5..7„4) 

vs 

(..5,.7I,.4) 

.486 

.232 

.513 

.767 

(.5, .7, .4) 

vs 

(.5,.72,.4) 

.552 

.210 

mV 

.789 

Significantly,  inspection  of  the  ten  discriminant  functions  constructed  for 
each  simulation  treatment  using  stepwi.se  di.scriminant  procedures  revealed  the 
most  stati.'tically  significant  and  more  plentiful  variables  were  of  the  IlOS  variety. 
Although  not  shown  in  either  Table  5.2  on  page  121  or  Table  5.3,  results  showed 
with  regard  to  discriminating  power,  one  type  of  feature  extraction  vector  by  itself 
(power  spectra,  bispcctra,  or  second-order  cumulant  spectra)  was  not  as  powerful 
as  combination  of  feature  types.  In  summary,  llOS  estimation  and  feature  ex¬ 
traction  methods  provide  a  suh.stantial  improvement  in  these  two  discriminating 
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efTectiveness  measures  for  the  simulated  incipient  fault  scenarios,  and  appears 
worthwhile  to  pursue  even  though  in  some  of  the  treatments  diminishing  marginal 
benefits  are  apparent.  In  addition  to  measures  of  discrimination  power,  measures 
of  classifying  power  are  helpful  in  the  comparison  of  the  feature  extraction  meth¬ 
ods. 


5.2.2.2  Clas.sincation 


The  marginal  contribution  results  of  combining  second-order  cumulant 
spectra  features  to  power  spectra  features,  and  also  combining  bispectra  features 
to  second-order  cumulant  and  power  spectra  features,  with  regards  to  training 
classification  are  shown  within  Table  5.4. 


Table  5.4 

Training  Classification  Performance  of  HOS  Features  versus  Power  Spectrum 
Features-Simulated  Wear  Data.  Numbers  represent  relative  performance  differ¬ 
ence  of  ten  discriminant  rules  over  30  classification  runs.  Alpha  is  the  statistical 


significance  level  for  rejecting  ec 

ual  performance  means. 

Feature  Fxtraction 

False  Alarm 

Performance 

Detection 

Performance 

Method 

Prob 

Alpha 

Prob 

Alpha 

PS  &  SCUM  vs  PS 

-4.9 

.0004 

+  4.8 

.0018 

PS,  SCUM,B  vs  PS 

-10.4 

.0001 

+  11.5 

.0001 

Clearly,  combining  nonstationary  feature  information  to  stationary  features  im¬ 
proves  both  false  alarm  probability  and  detection  probability  with  extremely  high 
levels  of  statistical  significance.  Furthermore,  the  inclusion  of  nonlinear  informa¬ 
tion  gives  better  training  classification  performance  with  an  additional  higher  level 
of  statistical  confidence. 
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Evidence  is  clear  that  combining  HOS  features  with  power  spectrum  fea¬ 
tures  improves  training  classification  of  the  simulated  scenario  data.  More  im¬ 
portant  to  the  evaluation  of  the  feature  extraction  methods  is  an  estimate  of  the 
actual  or  test  classification  error  rate.  This  measures  a  feature  extraction  method's 
capability  to  classify  future  time  series  samples.  Hence,  Table  .5.5  and  Table  5.6 
on  page  125,  respectively  show  the  marginal  contribution  to  test  classification 
performance  by  combining  second-order  cumulant  spectra  features  with  power 
spectra  features,  and  combining  bispectra  features  to  second- order  and  power 
spectra  features. 


Table  5.5 

Second-order  Cumulant  Spectra  &  Power  Spectra  vs  Power  Spectra  Feature  Ex¬ 
traction  Test  Classirication-Simulated  Wear  Data.  Numbers  represent  relative 
performance  difference  over  250  runs  per  classification  problem.  Alpha  is  the  sta¬ 
tistical  significance  level  for  rejecting  equal  performance  means. 


Classificatioti 

Treatment 

False  Alarm  Performance 
Prob  Alpha 

Detection 

Prob 

Performance 

Alpha 

(.3,.7,.4) 

vs  (.3,.71,.4) 

-0.3 

.85 

+  2.7 

.0001 

(.3,.7,.4) 

vs 

(.3,.72,.4) 

-4.5 

.0001 

+  4.6 

.0001 

(.3,.7,.8) 

vs 

(.3, .71, .8) 

+  2.7 

.01 

+  2.7 

.01 

(.3,.7..8) 

vs 

(.3,.72,.8) 

+  1.2 

.21 

+  3.5 

.0006 

(.3..4,.4) 

vs 

(.3,.41,.4) 

-0.1 

.90 

+  2.8 

.002 

(•3,.4,.4) 

vs 

(.3,.42,.4) 

-3.1 

.006 

+  4.5 

.01 

(.3,.4,.8) 

vs 

(,3,.4l,.8) 

+  3.6 

.01 

+  3.6 

.01 

(.3..4..8) 

vs 

(.3„42,.8) 

+  0.2 

.97 

+  2.4 

.003 

(.5,.7„4) 

vs 

(.5..71,.4) 

+  1.5 

.16 

+  3.1 

.0001 

(.5, .7, .4) 

vs 

(.5,.72..4) 

+  0.8 

.45 

+  5.9 

.02 
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Table  5.6 

Bispectra,  Second-order  Cumulant  Spectra,  &  Power  Spectra  vs  Power  Spectra  Fea¬ 
ture  Extraction  Test  Classirication— Simulated  Wear  Data.  Numbers  represent  rel¬ 
ative  performance  difference  over  250  runs  per  classification  problem.  Alpha  is  the 
statistical  significance  level  for  rejecting  equal  performance  means. 


Classification 

False  Alarm  Performance 

Detection 

Performance 

Treatment 

Prob 

Alpha 

Prob 

Alpha 

(.3,.7,.4) 

vs 

(.3..71,.4) 

-I.O 

.45 

+  3.4 

.0001 

(.3, .7, .4) 

vs 

(.3..72,.4) 

-4.9 

.0001 

+  6.8 

.0001 

(.3,.7..8) 

vs 

(.3,.71,.8) 

-0.2 

.88 

+  0.3 

.81 

(.3,.7,.8) 

vs 

(.3, .72, .8) 

+  1.2 

.30 

+  3.9 

.001 

(.3,.4,.4) 

vs 

(.3..41,.4) 

-1.5 

.32 

+  1.8 

.002 

(.3..4,.4) 

vs 

(.3, .42, .4) 

-6.5 

.0001 

+  6.2 

.002 

(.3,.4,.8) 

vs 

(.3,.41,.8) 

+  3.0 

.005 

+  3.0 

.005 

(.3..4,.8) 

vs 

(.3,.42,.8) 

+  0.07 

.01 

+  1.0 

.0009 

(.5, .7, .4) 

vs 

(.5,.71,.4) 

-0.4 

.73 

+  4.6 

.0001 

(.5, .7, .4) 

vs 

(.5,.72,.4) 

-0.5 

.69 

+  7.0 

.004 

When  the  feature  information  set  includes  both  stationary  (power  spec¬ 
trum)  and  nonstationary  (second-order  cumulant  spectrum)  components,  better 
test  classification  performance  is  obtained.  Significantly,  detection  performance 
is  increased  for  all  treatments.  Additionally,  within  each  pair  of  treatments,  or 
scenario,  a  greater  change  in  the  pha.se  modulation  index  parameter  is  accompa¬ 
nied  with  an  increased  false  alarm  and  detection  capability.  Thus,  IIOS  features 
appear  sensitive  to  greater  changes  in  phase  modulation  which  implies  they  have 
an  increasing  ability  to  detect  more  severe  wear  condition  states.  Noise  does  im¬ 
pact  classification  performance,  but  the  IIOS  approach  still  maintains  its  superi¬ 
ority  over  the  power  spectrum  approach.  Combining  nonlinear  (bispcctrum) 
feature  information  further  improves  test  classification  performance.  Thus,  there 
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is  an  increasing  marginal  benefit  for  conducting  HOS  estimation  for  subsequent 
feature  extraction. 

5.3  Actual  Wear  Experiment  Description 

Electronic  circuit  card  construction  begins  with  sandwiched  layers  of  very 
thin  copper  and  epoxy-glass  composite  material.  Holes  are  drilled  through  these 
layers  to  provide  pathways  for  interconnections  between  the  copper  conductor 
layers  and  sites  for  solder  attachment  of  electronic  components.  A  typical  elec¬ 
tronic  panel  consists  of  1000  to  5000  holes  with  diameters  of  .5  mm  to  2.5  mm. 
High-speed  machines  (20,000  to  200,000  RPM)  can  drill  1  to  5  holes  per  second 
with  either  single  or  multiple  drill  spindles.  The  drilling  machine  used  by  IBM  in 
their  experimental  study  has  a  drilling  capability  of  up  to  75,000  RPM  and  is 
shown  in  Figure  5.3  on  page  127.  Ramirez  (1991)  gives  a  complete  description 
of  the  mechanics  of  the  machine  structure. 

Most  circuit  card  manufacturing  defects  can  be  traced  to  problems  in  the 
drilling  process  (Block,  1989).  Problems  caused  partly  by  worn  or  damaged  bits 
include  rough  hole  surfaces  due  to  glass  fiber  tearing,  smearing  of  epoxy  from  high 
bit  temperatures,  and  poor  hole  location  and  variation  in  hole  diameter  because 
of  drill  wander.  Thus,  there  are  three  major  reasons  for  IBM  and  others  in  in- 
du.stry  to  investigate  drill  bit  wear  monitoring  methods:  (I)  to  improve  the  quality 
of  finished  electronic  panels  by  reduci.ng  incidence  of  poor  quality  holes;  (2)  to 
reduce  panel  scrap  costs  by  reducing  incidence  of  badly  damaged  holes;  and  (3) 
to  reduce  bit  replacement  co.sts  by  using  as  much  of  its  useful  life  as  possible. 
('Icarly,  panel  quality  is  a  function  of  hole  quality,  which  in  turn,  is  a  function  of 
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drill  bit  condition  and  engineers  in  this  particular  manufacturing  application  agree 
that  there  is  some  point  that  hole  quality  degrades  as  drills  wear,  necessitating  a 
bit  replacement  strategy.  Presently,  drill  bit  replacement  strategy  is  con.servatively 
based  on  the  shortest  observed  useful  life  since  there  is  no  effective  wear  moni¬ 
toring  system  implemented  in  the  industrial  environment.  Actual  wear  varies  with 
lot-to-lot  changes  in  the  workpiece,  drilling  rate,  condition  of  drilling  machine,  and 
type  of  drill  bit.  Drill  bits  characteristically  exhibit  large  variances  in  tool  life. 

IBM  experiment  instrumentation  included  X  (lateral),  Y  (translational), 
and  Z  (vertical)  accelerometers,  a  magnetic  reluctance  probe,  and  X  and  Y 
capacitance  probes  (see  F  igure  5.4  on  page  129).  This  study  focu.sed  on  the 
accelerometer  signal  data.  Two  accelerometers  (X  and  Y)  were  bonded  to  the 
journal  bearing  block  to  measure  lateral  acceleration  transmitted  from  the  drill 
spindle  to  the  machine  structure.  The  Z  accelerometer  was  mounted  to  the  thrust 
bearing  block  which  moved  with  the  spindle. 

5.3.1  E.xperimental  Design 

A  factorial  experimental  design  with  three  factors  was  employed:  age  of 
bit,  material  stack  type,  and  chip  load,  ('hip  load  is  the  amount  of  axial  distance 
travelled  by  the  drill  bit  tip  in  a  single  revolution  or  rotation,  fhere  were  two 
levels  for  bit  age  (new  or  no  holes  drilled,  and  slightly  used  or  8000  holes  drilled), 
three  levels  of  stack  type  (NIP,  6S2P,  and  sliort  NIP),  and  two  levels  for  chip  load 
(3  mil/revolution  and  4  mil/revolution).  A  1.09  mm  diameter  drill  with  a  spindle 
speed  of  47,000  RPM  and  nominal  chip  load  of  .076  mil/revolution  was  used. 
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Figure  5.4:  Instrumentation  of  Drill  Spindle 
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5.3.2  Collected  Data 

Vibration  time  series  for  each  of  the  three  accelerometers  were  analyzed 
for  two  types  of  drill  bits,  new  and  slightly  used.  Ten  bits  of  each  type  were  ran¬ 
domly  selected  and  optically  verified  for  wear  condition.  Figure  .5.5  contains 
magnified  pictures  of  a  typical  new  and  slightly  used  drill  bit.  New  and  slightly 
used  bit  data  were  obtained  for  both  chip  loads  and  two  of  the  three  stack  types. 
Shown  at  Figure  5.6  on  page  132  and  Figure  5.7  on  page  133  are  raw 
accelerometer  time  series  for  a  typical  new  and  slightly  used  drill  bit  for  one  of  the 
stack  material/chip  load  cases.  There  were  three  replications,  or  runs,  for  all 
eighty  bits  in  the  experimental  database  which  produced  720  time  scries  records. 
Preliminary  estimation,  feature  extraction,  and  classification  analyses  consistently 
showed  the  best  sensor  site  for  bit  class  discrimination  was  the  vertical  or  Z 
accelerometer.  This  conclusion  was  confirmed  in  a  physical  sense  as  the  thrust 
forces  are  additive  rather  than  subtractive,  and  had  bettei  signal-noi.se  character¬ 
istics  than  either  the  X  or  Y  accelerometers.  Thus,  results  presented  in  this  report 
are  only  for  the  Z  accelerometer.  For  computational  purpo.ses,  each  3  mil/rev  (4 
mil/rev)  time  series  is  divided  into  appropriate  record  lengths  as  given  the  respec¬ 
tive  760  Hz  (587  Ilz)  harmonic  frequency  of  the  drill  spindle,  an  integer  number 
of  signal  periods  was  necessary  to  avoid  the  elFccts  of  leakage  when  performing 
spectral  estimation  procedures.  Power  spectrum,  cumulant  spectrum,  and 
bispectrum  estimates  are  computed  over  blocks  within  each  time  scries  record. 
All  spectral  estimates  arc  averaged  over  appropriate  block  lengths,  and  then  in¬ 
corporated  into  the  ensemble  averaging  of  all  samples  of  its  particular  class. 
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5.3.3  Results 

Performance  results  are  given  as  two  separate  hut  related  categories,  dis¬ 
crimination  and  classification.  Both  performance  categories  are  reported  by  indi¬ 
vidual  stack  material  and  chip  load  case,  combined  chip  load,  combined  stack 
material,  and  corresponding  averaged  values.  Performance  results  obtained  for 
combined  data  address  the  impact  of  drilling  process  parameter  variation  such  as 
stack  type  and  cutting  conditions  on  the  feature  extraction  methodology.  It  is 
desirable  from  a  monitoring  system  implementation  perspective  that  the  feature 
extraction  and  classification  methodology  should  not  be  severely  impacted  by 
process  parameter  variation.  There  are  various  statistical  measures  to  compare 
the  discriminating  power  of  the  feature  extraction  methods.  In  this  study,  the  ap¬ 
proach  is  to  first  report  Wilks'  lambda  and  squared  canonical  correlation  measures 
for  the  training  or  discriminant  rules  constructed  from  the  number  of  existing 
samples  in  the  experimental  database  for  the  data  partition  types.  However,  since 
classification  is  the  major  objective  in  many  applications  of  discriminant  analysis, 
alternative  spectrum  feature  extraction  approaches  are  also  compared  by  examin¬ 
ing  two  major  classification  performance  components  which  define  the  rate  of 
correct  classification;  probability  of  detection  and  probability  of  false  alarm. 

5.3.3. 1  Discrimination 

-Shown  in  Table  .S.7  on  page  136  is  the  marginal  benefit  of  combining 
nos  feature  sets  with  power  spectrum  features  for  discrimination  effectiveness. 
Both  the  Wilks'  lambda  statistical  criterion  (smaller  is  better)  and  the  averaged 
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square  canonical  correlation  measures  had  significant  margin^'  '.x.iprovement  in 
six  of  the  eight  partition  database  types.  For  the  two  cases  of  no  overall  marginal 
improvement,  N1P4  and  NIP,  difTcrences  are  not  significantly  difierent  from  a  full 
HOS  feature  approach  (spectrum,  cumulant  spectrum,  and  bispectrum)  versus  just 
power  spectrum  features.  Recall  from  Chapter  4  that  the  most  significant 
second-order  cumulant  feature  for  the  NIP4  case  was  on,  rather  than  off,  the 
2-CSPD  diagonal  spectral  support  line.  Significant  marginal  improvement  for  the 
o.her  six  cases  is  obtained  by  combining  cumulant  spectrum  (nonstationary)  fea¬ 
tures  to  power  spectrum  (stationary)  features,  and  for  combining  bispcctrum 
(nonlinear)  to  stationary  and  nonstationary  feature  sets.  Additionally,  marginal 
improvement  is  gained  by  combining  nonlinear  feature  information  for  all  data¬ 
base  partitions,  and  particularly  for  the  two  cases  of  no  overall  marginal  im¬ 
provement  between  HOS  and  power  spectrum  feature  extraction  discrimination 
effectiveness,  a  large  marginal  improvement  with  nonlinear  features  was  gained. 
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Table  5.7 

Marginal  Discrimination  Benefit  of  HOS  Features  versus  Only  Power  Spectrum 
Features— Actual  Wear  Data.  Both  effectiveness  measures  represent  relative  dis¬ 
criminating  power  of  a  specific  discriminating  function  computed  on  thirty  vi¬ 
bration  lime  series  of  each  bit  class,  new  and  slightly  used.  Power  spectrum  is 
denoted  by  'PS',  second-order  cumulant  spectrum  is  denoted  by  '2C',  and 
bispectrum  is  denoted  by  B'. 


Discrimination 

Ca.se 

Wilks'  Lambda 

PS  PS&2C  PS,2C&B 

Squared  Canonical  Corr 

PS  PS&2C  PS,2C&B 

NIP/3 

.429 

.561 

.295 

.570 

.438 

.704 

NIP/4 

.379 

.462 

.430 

.620 

.537 

.569 

6S2P/3 

.510 

.370 

.299 

.489 

.629 

.700 

6S2P/4 

.555 

.530 

.392 

.444 

.470 

.607 

Stack/ Load  Average 

.46R 

.480 

.354 

,530 

.518 

.645 

Chip  Load  3 

.609 

.600 

.530 

.390 

.399 

.469 

Chip  Load  4 

.736 

.525 

.353 

.263 

.474 

.646 

Load  Average 

.673 

.562 

.441 

.327 

.437 

.5.58 

NIP  Stack 

.399 

.6,56 

.4.56 

.600 

.343 

.543 

6S2P  Stack 

.684 

.536 

.446 

.315 

.463 

.553 

Stack  Average 

..541 

.,596 

.451 

.4.58 

.403 

..548 

Inspection  of  the  llOS  discriminant  functions  constructed  for  each  data¬ 
base  partition  type  using  stepwise  discriminant  procedures  revealed  the  more  sta¬ 
tistically  significant  and  number  of  variables  were  of  the  IlOS  variety  rather  than 
the  power  spectrum.  Although  not  shown  in  Table  5.7,  results  did  show  from  the 
standpoint  of  discriminating  power,  one  type  of  spectrum  feature  e.xtraction  vector 
by  itself  (power  spectrum,  bispectrum,  or  sccond-ordcr  cumulant  spectrum)  is  not 
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as  powerful  as  the  combination  of  feature  types.  In  summary,  flOS  estimation 
and  feature  extraction  methods  provide  a  substantial  improvement  in  these  two 
discriminating  effectiveness  measures  for  the  drill  bit  wear  data  and  appears  useful. 

5.3.3.2  Cia.ssirication 

Combining  HOS  features  to  power  spectrum  features  clearly  improves 
discrimination  power.  Equally  important  to  an  evaluation  of  the  feature  ex¬ 
traction  methods  is  an  estimate  of  the  expected  actual  classification  error  rate. 
Results  are  given  with  tables  identified  by  the  applied  multivariate  classification 
algorithm,  two  parametric  approaches  (linear  and  quadratic  discriminant  or  LDF 
and  QDF)  and  one  non-paramelric  approach  (k-nearest  neighbor).  Results  are 
stated  in  the  following  manner:  (I)  within  a  classification  algorithm,  a  direct 
one-to-one  comparison  of  the  feature  extraction  methods  for  stack/load  and  also 
for  the  combined  process  parameters  (cutting  condition  and  stack  material)  clas¬ 
sification  cases;  and  (2)  a  comparison  of  feature  extraction  method  performance 
across  classification  algorithms  for  all  clas.sification  cases.  Comparison  of  the 
feature  extraction  methods  in  this  fashion  allows  an  evaluation  of  each  approach 
for  its  sensitivity  to  stochastic  process  conditions  and  also  to  the  classification  al¬ 
gorithm. 

The  contribution  of  combiiiing  HOS  features  to  power  spectrum  features 
using  a  linear  classifier  is  shown  in  fable  .‘i.8  on  page  13d.  LDF  classification 
results  demonstrate  the  marginal  benefit  of  performing  HOS  estimation  and  fea¬ 
ture  extraction  for  all  classification  cases.  Averaged  classification  performance 
measures  (stack/load,  load,  and  stack)  reveal  combining  HOS  features  with  power 
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spectrum  features  obtains  an  increasing  marginal  benefit  in  terms  of  overall  clas¬ 
sification  accuracy.  False  alarm  rates  may  be  higher  in  some  cases,  but  the  mar¬ 
ginal  increase  in  detection  capability  makes  up  for  the  difference  in  lost  capability. 
This  performance  result  is  significant  as  a  higher  detection  capability  is  more  de¬ 
sirable  than  a  lower  false  alarm  rate  in  most  industrial  manufacturing  situations. 
For  each  of  the  feature  extraction  approaches,  better  performance  is  obtained  with 
databases  which  are  the  most  homogeneous.  Also,  they  all  have  better  classifica¬ 
tion  ability  with  the  combined  stack  material  database  than  the  combined  cutting 
(load)  database.  Thus,  there  is  less  sensitivity  to  cutting  (chip  load)  variation  than 
stack  material  variation  which  agrees  with  a  major  finding  of  Ramirez  (1991)  that 
variations  in  circuit  card  con.struction  can  mask  the  effects  of  wear.  Significantly 
though  with  the  more  heterogeneous  data  (combined  load  and  combined  slack), 
the  performance  of  the  HOS  approaches  is  not  degraded  as  much  as  the  purely 
power  spectrum  approach. 
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Table  5.8 

HOS  Feature  Extraction  versus  Solely  Power  Spectrum  Feature  Extraction  Classi¬ 
fication  Using  LDF  Algorithm— Actual  Wear  Data.  Numbers  represent  percent  of 
thirty  slightly  used  drill  bits  incorrectly  classified  as  new,  or  false  alarm  rate,  and 
percent  of  thirty  drill  bits  correctly  classified  as  slightly  used,  or  detection  rate. 
Power  spectrum  is  denoted  by  PS',  second-order  cumulant  spectrum  is  denoted  by 
'2C',  and  bispectrum  is  denoted  by  'B'. 


Classification 

Case 

False  Alarm  Probability 
PS  PS&2C  PS,2C&B 

Detection  Probability 

PS  PS&2C  PS,2C&B 

NIP/3 

.133 

.366 

.200 

.850 

.966 

.933 

NIP/4 

.133 

.2.50 

.233 

.933 

1.00 

.966 

6S2P/3 

.150 

.166 

.100 

.862 

.933 

.896 

(^SIVIA 

.283 

.350 

.150 

.816 

.933 

.916 

Stack/Load  Average 

.175 

.283 

.170 

.865 

.958 

.927 

Chip  Load  3 

.150 

.316 

.300 

.728 

.850 

.816 

Chip  Load  4 

.258 

.300 

.233 

.675 

.866 

.850 

Load  Average 

.204 

.308 

.266 

.701 

.858 

.833 

6S2P  Stack 

.250 

.316 

.200 

.661 

.950 

.933 

NIP  Stack 

.183 

.333 

.250 

.933 

1.00 

.900 

Stack  Average 

,216 

.324 

.225 

.797 

.975 

.917 

Shown  in  Table  5.9  on  page  140  is  the  marginal  contribution  of  combin¬ 
ing  IIOS  features  to  power  spectrum  features  using  a  quadratic  classifier.  Similar 
marginal  benefit  results  with  IIOS  information  are  obtained  as  with  a  linear 
classifier,  but  the  more  difficult  parametric  classification,  quadratic  rather  than  a 
linear  function,  is  not  beneficial  for  the  power  spectrum  feature  extraction  ap¬ 
proach. 
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Table  5.9 

HOS  Feature  Extraction  versu.s  Solely  Power  Spectrum  Feature  Extraction  Classi¬ 
fication  Using  QDF  Algorithm— Actual  Wear  Data.  Numbers  represent  percent  of 
thirty  slightly  used  drill  bits  incorrectly  classified  as  new,  or  false  alarm  rate,  and 
percent  of  thirty  drill  bits  correctly  classified  as  slightly  used,  or  detection  rate. 
Power  spectrum  is  denoted  by  PS',  second-order  cumulant  spectrum  is  denoted  by 
'2C',  and  bispectrum  is  denoted  by  'B'. 


Classification 

Case 

False  Alarm  Probability 
PS  PS&2C  PS,2C&B 

Detection  Probability 

PS  PS&2C  PS,2C&B 

NIP/3 

.183 

.366 

.333 

.833 

1.00 

1.00 

NIP/4 

.183 

.166 

.200 

.833 

.966 

.966 

6S2P/3 

.150 

.200 

.100 

.810 

.966 

.931 

6S2P/4 

.300 

.300 

.250 

.700 

.816 

.88.3 

Stack/Load  Average 

200 

.2.58 

.220 

.794 

.937 

.945 

Chip  Load  3 

.175 

.366 

.266 

.737 

.916 

.950 

Chip  Load  4 

.258 

.266 

.116 

.675 

.833 

.900 

Load  Average 

.216 

.316 

.191 

.706 

.874 

.925 

6S2P  Stack 

.241 

.266 

.133 

.635 

.933 

.933 

NIP  Stack 

.108 

.400 

.333 

.933 

1.00 

1.00 

Stack  Average 

.175 

.333 

.233 

.784 

.967 

.967 

Average  stack/Ioad  classification  is  degraded  with  power  spectrum  features  and 
has  no  consequential  impact  on  either  averaged  load  or  stack  classification. 
However,  the  impact  on  averaged  total  classification  accuracy  using  IIOS  features 
is:  -1.5  percent  for  stack/load,  +8.3  percent  for  combined  load,  and  +2.1  percent 
for  combined  stack.  For  the  stack/load  case,  false  alarm  rate  increased  more  than 
the  corresponding  increase  in  detection  capability  so  there  was  a  slight  decrease 
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in  overall  QDF  classification  performance  for  this  database  partition.  However, 
the  masking  of  wear  effects  due  to  variations  in  card  construction  is  not  as  great 
with  HOS  features  and  QDF  classification.  The  major  disadvantage  of  a  power 
spectrum  approach  is  almost  overcome.  Significantly,  detection  capability  was  the 
major  component  of  the  increase  in  the  LDF  classification  accuracy  of  HOS  fea¬ 
tures  using  a  QDF  approach. 

The  marginal  contribution  of  combining  HOS  features  to  power  spectrum 
features  with  a  non-parametric  classifier  (k-nearest  neighbor  with  k  =  4)  is  shown 
in  Table  5.10  on  page  142.  Direct  comparisons  of  these  classification  results 
clearly  show  the  increased  overall  classification  power  of  the  HOS  feature  ex¬ 
traction  approach.  There  are  increases  in  both  false  alarm  and  detection  capabil¬ 
ity  with  HOS  features.  Additionally,  increasingly  marginal  benefits  are  evident  as 
more  spectral  feature  types  are  combined.  Nearest  neighbor  classification  makes 
no  difference  or  degrades  previous  parametric  classification  results  with  solely 
power  spectrum  features  due  to  increases  in  false  alarm  probabilities.  However, 
the  amount  of  increased  total  classification  accuracy  due  to  the  non-parametric 
method  ranges,  in  an  absolute  sen.se,  from  3  to  8  percent,  with  combined  HOS  and 
power  spectrum  feature  sets.  Additionally,  combined  chip  load  classification  with 
power  spectrum  and  cumulant  spectrum  features  is  only  slightly  degraded  with  the 
change  in  card  material.  There  is  no  doubt  non-parametric  classification 
(4-nearest  neighbors)  is  the  best  classification  approach  with  this  incipient  drill 


wear  databa.se. 


142 


Table  5.10 

HOS  Feature  Extraction  versus  Solely  Power  Spectrum  Feature  Extraction  Classi¬ 
fication  Using  4-Nearest  Neighbor  Algorithm— Actual  Wear  Data.  Numbers  repre¬ 
sent  percent  of  thirty  slightly  used  drill  bits  incorrectly  classified  as  new,  or  false 
alarm  rate,  and  percent  of  thirty  drill  bits  correctly  classified  as  slightly  used,  or 
detection  rate.  Power  spectrum  is  denoted  by  'PS',  second-order  cumulant  spec¬ 
trum  is  denoted  by  '2C',  and  bispectrum  is  denoted  by  'B'. 


Classification 

Case 

False  Alarm  Probability 
PS  PS&2C  PS,2C&B 

Detection  Probability 

PS  PS&2C  PS,2C'&B 

NlP/3 

.150 

.266 

.133 

.850 

.936 

.983 

NIP/4 

.100 

.166 

.133 

.933 

1.00 

.966 

6S2P/3 

.366 

.300 

.133 

.844 

1.00 

1.00 

6S2P/4 

.183 

.216 

.200 

.850 

.933 

.966 

Stack/ Load  Average 

.200 

.230 

.150 

.869 

.966 

.962 

Chip  Load  3 

.4t)8 

.183 

.316 

.822 

.933 

.933 

Chip  Load  4 

.175 

.266 

.200 

,775 

.933 

.950 

Load  Average 

.291 

.224 

.2.58 

,798 

.933 

,941 

6S2P  Stack 

.516 

.183 

.166 

.745 

966 

.983 

NIP  Stack 

.266 

.233 

.250 

.900 

.866 

.933 

Stack  Average 

.391 

.208 

.208 

,822 

.916 

.958 

Usefulness  of  combining  IIOS  feature  sets  with  power  spectrum  feature 
sets  is  clearly  demonstrated  with  the  results  gathered  from  simulated  and  actual 
wear  experiments.  A  MOS  approach  for  incipient  fault  detection  has  increa.sed 
discrimination  and  classification  power  and  is  less  sensitive  to  process  and  noise 
conditions  than  solely  a  power  spectrum  approach.  C'onclusions  of  the  study  arc 
stated  in  the  next  chapter. 


Chapter  6 

Conclusions  and  Further  Research 

Inferences  from  the  data  analyses  of  the  conducted  experiments  are  stated 
in  general  and  specific  form.  This  research  focused  on  cyclostationary  processes 
represented  by  simulations  of  single-tone  amplitude  and  phase  modulated  carrier 
signals  which  primarily  emphasized  phase  modulation  changes  and  new  and 
slightly  worn  high-speed  drills  in  the  "manufacturing  environment”.  The  evidence 
clearly  advocates  for  the  adoption  of  a  HOS  feature  fu.sion  approach  in  a  condi¬ 
tion  monitoring  scheme  for  rotating  sy.stems.  Whether  the  HOS  approach  can 
create  actual  economic  savings  in  an  industrial  setting  is  a  question  left  for  further 
research. 

Two  important  general  conclusions  are  drawn  from  the  results  of  this 

study: 

1.  Incipient  fault  detection  capability  of  multivariate  classifiers  significantly  im¬ 
prove  with  nos  feature  information. 

2.  Better  operations  and  maintenance  decisions  to  discontinue/service  rotating 
systems  are  possible  if  the  condition  monitoring  method  incorporates  the 
HOS  feature  fusion  approach. 

five  secondary  research  questions  supported  these  general  conclusions: 
(I)  What  is  the  impact  of  combining  HOS  features  with  power  spectrum  features 
upon  discrimination  and  classification  of  incipient  faults?  (2)  What  is  the  impact 
of  a  changing  process  environment  upon  classification?  (3)  What  is  the  impact 
of  applied  classifier  algorithm  upon  classification?  (4)  What  is  the  impact  of  a 
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slight  change  in  phase  modulation  upon  discrimination  and  classification?  (5) 
What  is  the  impact  of  increasing  noise  in  the  signal  environment  upon  discrimi¬ 
nation  and  classification?  Results  from  the  modulated  signal  simulations  an¬ 
swered  all  secondary  research  questions  except  for  the  third  one  while  results  from 
the  actual  experiment  answered  the  first  three  questions. 

In  the  simulation  experiments,  when  the  feature  information  set  included 
HOS  features  better  discrimination  power  was  obtained  (see  Table  5.2  on  page 
121  and  Table  5.3  on  page  122).  Additionally,  when  the  feature  information  set 
included  both  power  spectrum  and  second-order  cumulant  spectrum  features, 
better  training  and  text  classification  performance  was  obtained  than  with  just  a 
ower  spectrum  feature  set  (see  Table  5.4  on  page  123  and  Table  5.5  on  page 
iz4).  Further  improvement  in  training  and  test  classification  performance  was 
obtained  when  bispectrum  features  were  combined  with  second-order  cumulant 
and  power  spectrum  features  (see  Table  5.4  on  page  123  and  Table  5.6  on  page 
125).  Thus,  HOS  estimation  for  .sub.sequent  feature  extraction  provided  an  in¬ 
creasing  marginal  benefit  for  a  linear  classifier.  HOS  features  were  sensitive  to 
very  slight  changes  in  phase  modulation  which  implied  the  ability  to  detect  incip¬ 
ient  faults  and  a  greater  potential  capability  to  detect  more  severe  wear  condition 
states  of  rotating  machinery.  Finally,  noise  impacted  classification  performance 
whether  with  or  without  IIOS  information.  However,  with  moderate  or  even  high 
levels  of  noise  in  the  signal  environment,  HOS  approaches  were  still  better  at  de¬ 
tecting  different  simulated  signal  classes  with  very  high  levels  of  statistical  confi¬ 
dence  (see  Table  5.5  on  page  124  and  fable  5.6  on  page  125). 

In  the  actual  experiment,  when  the  feature  information  set  included  power 
spectrum  and  second-order  cumulant  spectrum  features,  better  discrimination 
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power  was  obtained  than  a  feature  set  based  only  on  the  power  spectrum.  Further 
discriminatory  power  was  obtained  by  combining  bispectrum  features  with 
cumulant  spectrum  and  power  spectrum  feature  sets  (.see  Table  5.7  on  page  136). 
This  same  marginal  beneficial  trend  was  demonstrated  with  classification  results. 
When  power  spectrum  and  second-order  cumulant  spectrum  features  were  com¬ 
bined,  classification  performance  increased  from  that  of  a  power  spectrum  feature 
set.  The  classification  performance  further  improved  for  all  three  applied 
multivariate  classifiers  when  bispectrum  features  were  combined  with  cumulant 
spectrum  and  power  spectrum  feature  sets  (see  Table  5.8  on  page  139,  Table  5.9 
on  page  140,  and  Table  5.10  on  page  142). 

Actual  wear  classification  results  presented  in  the  tables  of  Chapter  5  are 
now  condensed  as  total  classification  averages  (see  Table  6.1  on  page  146).  All 
feature  extraction  methods  were  sensitive  to  changes  in  process  parameters. 
However,  HOS  feature  extraction  was  less  sensitive  than  solely  power  spectrum 
feature  extraction.  Specifically,  variations  in  card  construction  significantly 
masked  the  effects  of  wear  when  power  spectrum  features  were  used  for  all  clas¬ 
sification  approaches.  Also,  chip  load  variation  masked  the  effects  of  wear  when 
power  spectrum  features  were  used  with  two  of  the  three  classifiers.  However, 
variations  in  card  construction  and  chip  load  only  slightly  masked  the  effects  of 
wear  when  power  spectrum  and  cumulant  spectrum  features  were  used  with  the 
4-ncarest  neighbor  classifier.  Furthermore,  variations  in  card  construction  and 
chip  load  had  no  wear  masking  effect  when  full  HOS  feature  sets  were  used  with 
a  quadratic  classifier.  By  selecting  and  combining  HOS  features  which  captured 
the  nonstationary  and  nonlinear  characteristics  of  the  cutting  forces  as  the  drill 


bit  penetrates  the  circuit  card  layers,  total  classification  capability  was  definitely 
enhanced. 
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Table  6.1 

Actual  Incipient  Wear  Total  Classification  Averages.  Combined  load  database 
tested  the  impact  of  stack  variation,  combined  stack  tested  the  impact  of  load  var¬ 
iation,  and  load/stack  was  the  most  homogcneovis  database  partition  with  no  vari¬ 
ation  of  drilling  process  parameters.  'PS'  represents  power  spectrum,  '2C' 
represents  second-order  cumulant  spectrum,  and  'B'  represents  bispectrum.  'LDF' 
and  'QDF'  denotes  linear  and  quadratic  parametric  classification,  and  'NN'  denotes 
the  nearest  neighbor  non-parametric  classification. 


Features  and 
Classifration 

Combined  DataBases 
Comb  Load  Comb  Stack 

Homogeneous  DataBase 
Load  and  Stack 

PS  &  LDF 

7.-5. 1  79.1 

84.5 

PS.  2C  &  LDF 

77.9  82.6 

83.8 

PS,  2C,  B  &  LDF 

78.3  84.6 

87.8 

PS  &  QDF 

74.5  80.5 

79.7 

PS,  2C  &  QDF 

77.9  81.7 

84.0 

PS,  2(:.  B  &  QDF 

86.7  86.7 

86.3 

PS  &  NN 

75.4  71.6 

83.5 

PS,  2C  &  NN 

85,4  85.4 

86.8 

PS,  2C,  B  &  NN 

84.1  87.5 

90.6 

Thus,  results  of  all  five  secondary  research  questions  revealed  that  a  con¬ 
dition  monitoring  approach  based  on  power  spectrum  characteristics  was  more 
sensitive  to  external  noise  and  stochastic  process  parameter  variation  than  that 
which  incorporated  HOS  information.  These  results  provided  statistical  evidence 
for  the  two  general  conclusions  of  the  study  and  clearly  demonstrated  the  benefits 
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of  HOS  estimation  and  feature  extraction  as  preprocessing  steps  for  a  multivariate 
classifier. 

6.1  Areas  of  Further  Research 

Further  studies  are  possible  in  both  the  applications  and  methodology 
areas.  First,  analysis  of  IBM  extended  drill  wear  data,  already  gathered  and  ob¬ 
tained,  is  needed  so  more  than  two  cla.sses  of  drills  can  be  examined.  Hence, 
testing  the  ability  of  classifiers  with  and  without  HOS  features  to  detect  different 
levels  of  drill  wear  can  be  investigated.  The  particular  HOS  feature  sets  selected 
from  analyses  of  the  incipient  wear  factorial  experiment  can  be  further  examined 
for  their  predictive  ability  of  advancing  drill  wear.  It  is  already  known  that  the 
fifth  through  the  eighth  harmonics  of  the  Z  acceleration  power  spectrum  were 
most  sensitive  to  advanced  drill  wear  (Ramirez,  1991).  These  particular  power 
spectrum  responses  steadily  increased  as  drill  wear  progressed,  and  rapidly  in¬ 
creased  when  drill  wear-out  was  achieved.  Quite  significantly,  the  bispectrum 
chloropicth  difference  plots  of  each  case  of  the  incipient  wear  data  revealed  these 
same  harmonic  frequencies  interacting  with  higher  frequencies  to  be  among  the 
most  significantly  different  frequency  interactions  between  classes!  It  is  possible 
that  bispectrum  analysis  can  be  u.sed  as  a  predictive  tool  of  advancing  wear. 
Second,  a  large  database  of  pump  and  fan  failure  data  obtained  from  TRACOR 
(Austin)  can  be  studied  for  applicability  to  other  rotating  machinery  besides 
high-speed  drills.  The  TRACOR  data  is  already  analyzed  by  some  of  the  vibration 
analysis  techniques  mentioned  in  Chapter  2.  Results  from  the  HOS  analytical 
approach  could  be  compared  with  the  results  of  these  other  techniques.  Third, 
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other  classification  algorithms  such  as  neural  networks  could  be  applied  to  inves¬ 
tigate  whether  trends  identified  in  this  research  continue  to  hold.  Obtaining  the 
right  type  of  information  required  for  proper  further  investigation  of  the  llOS 
approach  is  tedious,  difficult,  and  expensive.  The  hard  work  of  obtaining  excel¬ 
lent  experimental  data  is  done  as  both  rotating  machine  failure  databases  are 
available  from  the  author.  Some  ideas  of  expanding  on  the  methodological  work 
is  given  next. 

First,  third-order  cumulant  estimation  and  feature  extraction  can  be  per¬ 
formed  for  the  experiments  described  in  this  work.  This  is  extremely  important 
to  investigate  as  the  contribution  from  this  cumulant  spectrum  measure  will 
probably  be  more  significant  than  the  second-order  cumulant.  Second,  more  in¬ 
volved  and  complicated  simulation  experiments  such  as  testing  with  dilTering  lev¬ 
els  of  the  experimental  parameters  and  also  multiple-tone  simulations  are  needed. 
Third,  more  investigation  with  regard  to  the  correspondence  of  the  statistical 
findings  of  this  research  to  the  actual  physics  of  wear  processes  occurring  in 
high-speed  drilling  of  composite  circuit  card  materials  is  a  good  research  project 
for  a  mechanical  systems  graduate  student.  Finally,  depiction  of  how  the  statis¬ 
tically  significant  features  change  and  move  through  spectral  principal  domain 
regions  over  multiple  conditions  of  the  machine  is  another  area  for  methodological 
rcseaich. 

6.2  Summary 

This  research  study  was  significant  in  many  respects.  This  is  the  first 
work  to  provide  a  rigorous  study  of  the  IIO.S  approach  in  detecting  incipient 
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faults.  Moreover,  this  is  the  first  work  to  address  the  nonstatiortary  aspects  of 
random  fault  mechanisms.  Mo.st  incipient  fault  detection  methods  are  usually 
tested  against  one  specific  application  or  machine  system.  Due  to  the  inherent 
stochastic  nature  of  the  systems  under  study,  a  statistical  and  experimental  design 
framework  is  necessary  to  thoroughly  investigate  a  particular  monitoring  ap¬ 
proach.  Deriving  statistical  conclusions  which  show  consistency  with  both  simu¬ 
lated  and  actual  time  series  signals  gives  validity  for  the  new  HOS  monitoring 
approach.  Understanding  the  necessity  to  investigate  and  justify  structural  as¬ 
sumptions  such  as  linearity,  Gaussianity,  and  stationarity  of  time  series  data  is  one 
of  the  major  lessons  learned  in  this  study.  This  research  explains  the  procedures 
for  manipulating  such  time  series  data  so  that  other  rotating  machinery  situations 
can  be  properly  analyzed.  Because  the  research  approach  provides  the  tools  for 
investigating  and  exploiting  a  wider  potential  range  of  time  series  characteristics 
generated  by  random  fault  mechanisms  of  cyclostationary  processes,  improved 
executive  decisions  in  both  the  maintenance  and  operational  environments  of  ro¬ 
tating  machinery  are  possible. 
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Appendix  A 

Second-Order  Cumulant  Spectrum  Estimation  Program 
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Appendix  B 

Hannonic  Process  Model  Stationarity  and  Finite  Memory 
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Harmonic  Process  Model  (HPM)  Stationarity  and  Finite  Memory 


When  (/)„  are  independent  uniformly  distibruted  random  variables,  V(t)  is 
always  stationary  irrespective  of  A„  and  m„  values.  There  are  two  conditions  for 
stationarity: 

£[F,]  =  OforaUt  [S-1] 

and 

C(7v(F„F2)  =  Cov(T)  =  /?^(r)  [5-2] 

where  t  is  the  time  shift  or  lag  parameter. 

B-1  is  easily  shown  first  and  consider  B-2  for  the  n  =  1  case.  The  argu¬ 
ment  shown  is  easily  extended  to  the  general  case.  Because  of  the  given  informa¬ 
tion  the  expected  value  of  V  (t)  is: 


cos(w,./  +  (f>)d<}) 


=  ^[sin(f/v/)  +  <^]]  =0 


for  all  t 


Now  for  the  second  condition  for  stationarity. 
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Cov{V„  V,)^E{Vi,)}{V{t+r)i  r  =  0.  ±  1.  ±  2,  ... 

«•  OO  p  oo 

=  -y  TtJ  cos(a)^/  +  <f))d<f>  -Y  TT  cos  {a)^(t  4-  t)  + 

— oo  oo 

2 

=  J  cos(<D/  +  (p)  cox(o)(/  +  r)  +  (^)d(f> 


-OO 

2 

=  J  [  co.s  {(2<x)t  +  for)  +  2<p}  +  cos  cuT^dif) 

— OO 

2  ^  -|oo 

(  cos  ft)T)rf<^  = cos  WT  J 


=  [tt  cos  (dt  —  {  —  n)  cos  wt] 

^2  2 

=  —i —  (2 re  cos  ojt)  =  -7-  cos(fi)T). 
47r  2 


For  the  general  case  ; 


OO 


/I  5=  0 


Thus,  the  autocorrelation  function  p(t): 


^(r) 

im 


V  ^ 

}  a„  cos  o)„T 

n  =  0 _ 

00 

cos 

n  —  0 


00 

^COSO>„T 
«  =  0 
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So,  both  the  autocorrelation  and  autocovariance  functions  of  a  harmonic  process 
consist  of  a  sum  of  cosine  terms  and  thus  never  die  out.  This  is  in  contrast  to 
MA  and  AR  processes  and  so  the  finite  dependence  assumption,  or  finite  memory, 
IS  not  applicable  for  HPM.  Stationarity  is  applicable  no  matter  what  choice  is 
made  of  the  amplitude  and  frequency  term.s. 


Appendix  C 

Power  Spectrum  Broadening 
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Cosine- Wave  Carrier  Signal  Spectrum  Broadening 


Signals  generated  from  rotating  machinery  not  yet  performing  a  particular 
machining  process  produce  a  pure  harmonic  tone  due  to  its  periodic  driving  force 
mechanism:  cos  Infj  =  cos  cdJ.  However,  once  machining  is  performed,  the  gen¬ 
erated  signal  is; 


KmpmiO  =  *[•  +  W^/)]  COS{(0,t  +  <t>c  +  inpg{t))  +  n{t) 

Notation  described  in  the  report  text  is  repeated  in  this  appendix  but  is  condensed 
for  ease  of  presentation.  In  the  text,  is  the  amplitude  and  phase  modu¬ 

lated  cosinc-wavc  carrier  signal,  w,  is  the  amplitude  modulation  index,  J[i)  is  the 
amplitude  modulating  signal,  0,  is  the  carrier  signal  phase,  m,  is  the  phase  modu¬ 
lation  index,  and  g{t)  is  the  phase  modulating  signal.  Now. 
/{t)  =  cos  Mj  and  ^(0  =  cos  o),/  with  f,  and  f,  are  the  frequency  of  the  amplitude 
and  phase  modulating  wave,  respectively.  Briefer  notation  for  this  appendix  is  the 
following: 


Kmpmil'f  =  +  w(r)]  COs(tzV/  +  d>(0  4-  0) 

whc  '  andoin  amplitude  and  phase  modulations  are  represented  by  m{t)  and  4>(i) 
respectively,  and  0  is  the  random  carrier  phase  variable  that  is  independent  of  both 
the  random  amplitude  and  phase  modulation  variables  and  has  the  same  uniform 
pdf  If  m{t)  and  (^(/)  are  zero  mean,  stationary,  and  statistically  independent  ran¬ 
dom  variables,  the  power  spectral  density  of  is  now  derived. 

First,  the  autocovariance  {Rv-  ])  is  computed; 
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«^,  =  £■[(!  4-m)(l  4- w')]  £[cos(p  +  (^  +  0)  cos(p' +  (^' +  0)  ]  [C-  I] 

where  m'  =  0' s  <^(/'),  p  =  cu./,  and  p'  =  toj'.  Also  the  shortened  notation 

of  CC  =  E\_  cos(p  4-  </>  +  0)  cos(p'  4-  (^'  4-  0)  ]  will  be  utilized.  The  first  ensemble 
average  of  C-1  is 


£[(1  4-m)(l +m')]=  1 +  [C-2] 


where  r  =  i'  —  t,  R„  =  E  ],  and  the  ensemble  averages  of  m  and  m'  van¬ 

ish  because  m  is  a  zero-mean  random  variable.  The  second  ensemble  average  of 
C-1  is  more  work  to  calculate.  Derivation  uses  the  trigonometric  identity  for  cos 
(a  4  b)  and  the  fact  that  0  and  0  are  statistically  independent: 

CC  =  £  [  cos(p  4-  4>)  cos(p'  4-  <f)')  ]£  [  cos^0  ]  4-  £  [  sin(p  4-  0)  sin(p'  4-  <j>')  ] 
£[sin^0] 

-  {  [  sin(p  4-  4))  cos(p'  4-  <^')  3  4-  [  cos(p  4-  <t>)  sin(p'  4-  <^>')]}  £  [  co5^0  ] 

=  y  £  [  cos(p  -  p'  4-  0  -  ] 

=  y  cos(p  -  p')£  [  cos(<^  -</>')]-  y  sin(p  -  p')£  [  sin(0  -  </>')  ]. 


Because  |(/)|  is  very  small  in  comparison  with  unity,  Taylor  series  expansions  of 
sin((:^  —  4)')  and  cos{4>  —  4>')  are  used  to  give: 


CC~^co.s(p-p')[l  -±El(4>-4>'f~i+  -  ] 
-4-i^,sin(p-p')[£[(,/,-,^f]+  ]. 


[C-4] 


Ignoring  terms  of  third-order  and  higher,  and  substitution  for  p  and  p'  C-4  is  then: 
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CC  =  ^  cosKt)  [1  -  /?^(0)  +  R/r)  ].  [C  -  5] 

Finally,  substitution  of  C-4  and  C-2  into  C-1  yields; 

^  [  1  +  «„(t)]C  I  -  R^(0)  +  cos  o>,T 

7  [C-6] 

y  C 1  -  /?^(0)  +  RJr)  +  R^{t)1  cos  co.t, 

where  terms  proportional  to  R„R^  are  ignored  due  to  their  higher  order  in  the 
smaller  quantities  nt  and  4). 

The  power  spectral  density  is  obtained  by  using  C-6  in  the  standard  defi¬ 
nition  of  the  power  spectrum  as  shown; 


/’(/)=  R{T)e^~‘^'’^"''dr. 


The  result  is; 
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/’(/)=  :^  (I -4)  dr 

— OO 

oo 

-OO 

oo 

+  \\  dr 

— OO 

+  Tf”W'‘" '''■'•”* 

— oo 

+  yj 

OO 


Using  the  definitions  of  Dirac  delta  functions  and  the  power  spectrum  C-7  be¬ 
comes: 


\lPmif-fr)  +  PJf  +  fc)  +  P^{f-fr)  +  ^(A  +  /.)]. 


[C-8] 


where  nl  =  /?*(0).  From  (.'-8  it  is  seen  how  the  discrete  spectral  lines  are  broad¬ 
ened  by  both  amplitude,  I\,  and  phase  modulations, 
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